DNA
– information storage technology way beyond computers
The body’s most precious substance is stored deep inside the cells, in the tiny nucleus, namely the genetic information, known as the genome.
If this information were to be written down using the alphabet, it would fill one thousand books, each having 1000 pages with 3,000 letters to the page. The human genome (inherited material) consists of three thousand million genetic “letters”. If all these letters were typed in one line, it would extend from the North Pole to the equator.
A good typist working 220 eight-hour days per year, at a typing speed of 300 letters a minute, would require 95 years for this task – much longer than her entire working life! Taking into account the time required for planning and testing up to the implementation of the final system, a scientific programmer produces 40 symbols of program code per day on average.
It would thus require a team of 8,000 programmers, devoting their entire career to this project, to program the human genome. But no human programmer knows how to structure this program and fit it into a DNA fibre measuring only one metre if stretched out. 1 Storage density: The storage medium of genetic information is the double stranded DNA (chemical name: de(s)oxyribonucleic acid; see diagram on page 74). The volume 2 of human DNA is extremely small, only three thousand millionths of a cubic millimetre (3 x 10-9 mm3). Its storage density is enormous, many magnitudes higher than the latest computer chips. In fact, it is the highest known. Let us try to visualise it: If one could stretch out the head of a pin measuring 2 mm in diameter until it became a thread having the same thickness as a DNA molecule, it would be 33 times as long as the equator! Could you have imagined that? If the genome information were in printed form, it would require 12,000 paperbacks of 160 pages each. Compared to the current 16 Megabit computer chips, a human DNA strand stores an amazing 1,400 times as much information.
To get a further idea of the almost unimaginable density of information in the DNA molecule, imagine you had just enough DNA to fit into a pinhead. Now imagine the information contained in an ordinary pocket paperback of 160 pages.
How many such lots of information could you store in this tiny amount of DNA? The answer is 15 x 1012 (15 million million). If you actually had that many of these books and put them on top of one another in a pile, this would be 500 times as high as the distance (384,000 km) from the earth to the moon! To put it another way – if you distributed these books equally among the roughly 6 billion people on Earth, each person would get 2,500 copies! Structure: The total amount of genetic information can be compared to a library, where single books represent chromosomes, and their chapters are the genes. Genes are like entries in a gigantic encyclopedia. There are 23 pairs of chromosomes in the nuclei of our somatic (body) cells, making up the diploid (Greek diplóos = double) number of 46 chromosomes. Single chromosomes can be distinguished according to their total length, the length of the chromosomal arms, and the position of the centromere, the point at which they are constricted. With the exception of the sex chromosomes, the chromosomes from each parent correspond to those from the other parent in regard to the type and sequence of the hereditary characteristics.
Women have two equal sized sex chromosomes (XX), but men have a larger and a smaller sex chromosome (XY).
The 23 human chromosome pairs comprise a double complement of approximately 100,000 inherited characteristics or genes. Every gene occurs twice, one derived from the mother, and the other from the father; they are thus known as diploid chromosomes. In contrast to the body cells, the germ cells (egg and sperm cells) have a single complement of chromosomes, called haploid (Greek haplous = single). Since the 30,000 genes are shared among 23 chromosomes, each chromosome is made up of about 1,300 genes.
The DNA molecules of bacteria, when stretched out, are around one millimetre long. This corresponds to around 3 x 106 nucleotide pairs. The well known bacterium E. Coli has 7.3 x 106 nucleotide pairs. In human body cells the total length of the DNA is around 2 metres, about 6 x 109 nucleotide pairs.
We must distinguish between the gametes or sex cells (which carry the information of heredity to the next generation) and the somatic or body cells. In the gametes (sperm and egg cells) the total length of the DNA threads is around 1 m, divided into 23 chromosomes. That represents 3 x 109 nucleotide pairs. These can constitute 109 words (triplets, each of three chemical letters).
The nucleotides are the four chemical letters of the genetic alphabet, called Adenine, Guanine, Cytosine and Thymine. Human body cells carry a dual batch of hereditary information – one from the father and one from the mother. So they have 2 x 23 = 46 chromosomes, corresponding to a DNA length of 2 metres (6 x 109 nucleotide pairs).
The number of possible genes can be estimated.
We start with an average sized gene product (protein), and look at the number of DNA building blocks (nucleotides) needed to code for that number of amino acids. For example, human hemoglobin, the pigment in red blood cells. The alpha chain has 141, the beta chain 146 amino acids. Each amino acid needs three nucleotides to code for it, so that means for both chains we need 3 x (141+146) = 861 nucleotide pairs.
Therefore our DNA should theoretically be able to code for 3 x 109/861 genes coding for proteins the size of hemoglobin. In reality, however, the majority of the DNA consists of sequences which do not code for proteins, and their function is still unclear today (though some hints may be gradually emerging). Only 50,000 to 100,000 genes actually code for proteins. To put it another way: Only about three percent of the genome actually codes for proteins such as insulin or hemoglobin. Such program codes are identical for all people. Remarkably, in most instances, more than one gene codes for a given characteristic (e. g. eye colour).
It seems necessary to assume that in addition to its protein coding portions, DNA contains countless additional levels of structure and function.
Such stored information concepts are just as much required to code for the development of the smallest organelles such as the mitochondria and ribosomes, as for building the large organs (e. g. heart, kidneys, brain) and the overall integrated organism. As yet, no one has been able to decode this incredibly complex system. Perhaps some light will be shed on this by research over the next few years.
If the total paternal contribution to heredity is contained in a sperm cell, and the maternal in an egg, then this would have to not just involve the total anatomy and physiology of a human being, but also our numerous predispositions and gifts.
For example, musical ability, aggressiveness, or language aptitude. Are the non-material characteristics of people, for example, the ability to love, or experience joy, in fact reducible to being described by a nucleotide sequence? Here we still face major scientific mysteries.
Information processing
Our 30,000 genes provide exact instructions to each cell for manufacturing everything required for it to carry out the role for which it is programmed; whether hormones, enzymes, mucus, sebum, the weapons of the immune system, or the impulses in the nerve cells of the central nervous system.
One might well ask at this point: How is this information decoded, and how are these abstract “words” translated into concrete protein molecules? This never-ending process takes place inside an unimaginably small space, namely within the cells, each measuring only a few hundredths of a millimetre. Special protein molecules locate a particular piece of information – a gene – copy it, and prepare a messenger, a chemical relative of DNA called messenger-RNA. This mRNA then travels from the control centre in the nucleus out into the cytoplasm, to the ribosomes.
These small granular bodies are where protein synthesis takes place. When these RNA messengers arrive here, they specify the sequence in which the 20 types of amino acids, the building blocks of all proteins, are to be assembled. Protein molecules are constructed here “block by block”, just as a house is built brick by brick; they are subsequently dispatched to carry out their various vital functions.
The next important step, namely the formation of specific structures like cells and organs from these protein molecules, is very complex, and is not yet fully understood. But we do know that it is somehow encoded in our genes*, and it largely determines what we are. Our genes ensure that we become human beings rather than animals.
Our gender, the colour of our eyes, skin and hair, and to a great extent our size, are all determined by our personal genome. It sets parameters for our intelligence and, to a large extent, determines our never-to-be-repeated unique personality. All these patterns are set at the precise moment in which the male chromosomes in a sperm cell meet up with those of a female egg cell (ovum).
The moment of fertilisation truly is the starting point of our life.
A comparison
Each of our approximately 100 million million (1014) cells has the following main components: a cell membrane, many pores and channels in this membrane, many mitochondria for regulating the flow of energy, many ribosomes which translate genetic information into proteins, and a nucleus containing the genetic information in the form of DNA.
Nowadays many people are familiar with the parts of a personal computer (PC), like hard disk, read/write head, interface, and network card. To explain the performance and complex functioning of a biological cell, Zoltán Takács, a biophysicist, compared the processing and storage of information in a cell with what happens inside a computer. If a cell in simplified form is regarded as a computer, we have the following analogies:
- The cell membrane would be the computer housing, but it is only 10 nanometres thick (= one hundred thousandth of a millimetre).
- The pores and channels are the interfaces.
- The mitochondria comprise 800 network cards.
- A ribosome would be a central processing unit (CPU), but a biological cell has more than six million CPUs.
- The nucleus would correspond to a hard drive.
There would then be 23 different hard disks (= chromosomes), each of which has its own backup disk. The storage capacities of these 23 disks add up to about 1 Gigabyte. Biological “hard disks” are actually not hard, but can perhaps be regarded more like ”floppy disks”; the 46 strands of DNA do not rotate around a fixed spindle, but occur as loose clusters in the nucleus.
– The diameter of this biological computer is about 20 micrometres (= two hundredths of a millimetre).
It is obvious from even this small number of facts about ”biological computers“ that, next to the comparatively simple computers of our own technology, they are masterpieces of miniaturisation, complexity, and design perfection.
All cells in our body carry the same information, regardless of their locality (e. g. kidney, liver, or arm). But different cell types access and process different sections of the available information. As in the case of physical computers, the original information is not transferred to the CPUs.
Copies are made and transported. In a computer the read/write head is positioned at the beginning of an application program on the hard disk to copy it. But in a cell several reading heads begin at different locations to make copies simultaneously, so that the various pieces of information required for a certain type of cell are read from all the “hard disks” at the same time. A biological computer performs two kinds of “computations”: It provides the information for protein synthesis as described above, and it replicates itself by means of cell division.
The Genome Project
Scientists from all over the world instigated an ambitious project, intended to run 15 years, to chart the human genetic code and decipher it letter by letter (this is known as sequencing). The Human Genome Organisation was founded for this purpose and boasts about 1,000 members in 50 different countries. The project officially began on 1st October 1990. The effort required to determine the letter sequence of human DNA was originally estimated as being thousands of man-years. We have already got some idea of the enormous amount of information contained in the entire sequence. Our genes contain the complete human construction plan.
There, in code, we find instructions about how our eyes, ears or heart and all the physiological details of our bodies are formed, as well as all our abilities.
By the end of the year 2000 there was already much media euphoria about the complete sequencing of the human genome (the entire genetic information of human beings). Headlines such as “Life’s blueprint deciphered”, and variations on the same theme, were abundant. Associated with this, we were given a picture of a brave new world, in which all would now be possible: synthetic genes against AIDS would be prophylactically inserted into bloodlines. Alzheimer’s patients would receive transplants of genetically manipulated brain cells; cancer cells would, following the introduction of new genetic material, simply consume themselves; and gene-vaccinated transplant organs would no longer be rejected by the recipients.
Reports like this are extremely impressive but are unfortunately not quite true. What is the real status of the research? At the end of 2001, only 90 % of the letters of the genome have been deciphered. The remaining 10 % has not yet been deciphered with the required accuracy. It was presumed that 100,000 genes were distributed among the 23 chromosomes. Now it is thought to be more in the region of 30,000 to 40,000.
What have we gained if, as is hoped, we have the complete succession of ACGT letters for the human genome by 2003? Will we then possess the programme for life? Will we know how our creator coded our brains, for example? Not in the slightest! What we will have is comparable to the complete text of the Bible without commas and full stops, in a language which we do not understand.
It is therefore like a book that no-one can interpret. The actual work of translating the text (the semantics) won’t even have begun. It is unclear whether we will ever be able to decipher the genome. The Egyptian hieroglyphics were finally decoded because the Rosetta stone was found with a text in Greek, demotic and hieroglyphics.
After a long period of research, the hieroglyphics were finally deciphered on the basis of the Greek text. There is no such stone available to us for the human genome.
However, there is one very important thing we do know: nowhere is the information so densely stored as in the DNA molecule. As information is a mental not a material quantity, we deduce that this information cannot have developed within the material. An intelligent creator has to be behind it.
The structure of the DNA molecule
Chemically and structurally the DNA molecule is one of the most complex and versatile of molecules, a necessity in view of all its functions. This versatility is necessary to provide for all its functions. It looks like a double helix from the outside (Greek hélix = spiral) comprising two intertwined spiral strands. Each strand is a long molecular chain, and the two strands are essentially parallel, intertwined in a right-hand spiral. The genetic code comprises four chemical letters adenine (A), guanine (G), thymine (T), and cytosine (C). Many genes do not consist of a continuous piece of DNA, but are made up like a mosaic of several separated segments.
In all cells, the genetic information stored in the DNA molecules controls protein synthesis, and another nucleic acid, ribonucleic acid (RNA), handles the transfer of all the information. In general, all cells of an organism contain identical DNA molecules, but not all genes are active at the same time in all the cells.
Proteins
Proteins are the workhorses of life. If we regard the DNA molecule as the blueprint of life, then the many different kinds of proteins are not only the bricks and mortar, they make up the required tools as well as some of them being the manual labourers which perform the actual construction jobs. Our genes provide the conceptual foundations (they store the “software”), but we are what we are (the “hardware”) because of our proteins. Both DNA molecules and proteins consist of long chains made up of strings of subunits, but their functions are fundamentally different.
DNA molecules comprise the genetic archives. On the other hand, proteins exhibit an unimaginable diversity of three-dimensional shapes, reflecting their multiplicity of functions.
Some of the tasks of proteins are that they serve as structural elements for the body, as messenger molecules, as receptors for messengers, as individual cell identifiers, and as substances defending against cells bearing foreign identifiers. Probably the most important proteins are the enzymes, which control the rate of biochemical processes by acting as catalysts. Certain enzymes can accelerate some reactions a millionfold or even more. Enzymes are also indispensable for the actual process of converting genetic information into its resultant products and processes.
Structure and chemistry of proteins
Although there are many amino acids, the Creator chose only 20 of them from which to construct all conceivable proteins (and thereby the structures) necessary for life. In the genetic code, three letters specify one amino acid, and every protein consists of an exactly determined sequence of amino acids. All the physical and chemical properties of an individual protein are determined by the length of the chain and the specific sequence of amino acids. The spatial disposition or folding of the chain is especially important. Proteins fold in such a way that the free energy is kept to a minimum; this means that a protein assumes the most “comfortable” shape. In principle one can only deduce the three-dimensional structure of a protein from the amino acid sequence, if all the forces acting on all of its thousands of atoms are known, as well as their effects on the surrounding molecules of the solvent. Such calculations are impossible in the present state of our knowledge, even using the most powerful computer systems**. But when the Creator made all living organisms, He constructed each and every protein in such a way that all the desired properties were obtained.
* With a possible inherited (maternal) contribution from the cytoplasmic structure of the egg cell.
** Remarkably, it now appears that the folding of many proteins after their construction (which would often be too slow if left to the physical forces acting on a particular protein’s components) is aided by specially tailored ”chaperone” molecules.