Biology 11 Fall 2005 How DNA Codes for Protein

October 7: How DNA Base Sequences Code for Protein Amino Acid Sequences

I want to concentrate on learning TWO key ideas today.
1) The genetic code (the concept, NOT the details). You should be able to use a table of the codons to figure out the what (SIX!) amino acid sequences are coded for by a given double-stranded DNA.
Guanine-Guanine-Guanine = GGG codes for glycine
Cytosine-Cytosine-Cytosine = CCC codes for proline
So if you had a double stranded DNA in which one strand had all guanines so that the other strand had all cytosines,

GGG GGG GGG GGG GGG CCC CCC CCC CCC CCC

then it could code for either a protein consisting of Glycine-Glycine-Glycine etc. OR a protein coding for Proline-Proline-Proline, depending on which way it was read
On the other hand, if we imagine a double stranded DNA consisting of CGCGCGCGCGCGCGCGCG

CGC GCG CGC GCG CGC GCG CGC GCG CGC GCG

this would code for Arginine-Alanine-Arginine-Alanine, CGC = arginine GCG = alanine
There is NO point in trying to memorize the genetic code,
in the sense of learning which amino acids are coded for by each of the 64 codons!!
Just learn the following key points (Roman numerals I through IX):

I) GGG happens to code for glycine (because it's easy to remember, and illustrates the principle)
II) There are a total of 64 different "codons"; because 4 cubed is 64.
There are 3 bases in each codon (hence cubed),
And there are 4 alternative bases (hence 4)
[If each codon were 2 bases long, then there would be only 16 different codons!]
[If there were only 2 bases instead of 4, then codons would need to be at least 5 bases long, in order to code for 20 amino acids]
III) Of these 64, 3 particular codons happen to mean "STOP".
IV) Each of the other 61 codons "codes for" a particular one of the 20 amino acids.
Notice this means lots of duplication, in the sense of synonymous codons, that "mean" the same amino acid. Not just GGG, but GGC, GGA, & GCT (GCU) all = glycine
(the jargon is to say that "the genetic code is degenerate")
V) If I provided you with a diagram showing all the codons, then you ought to be able to use it to figure out which amino acid sequences would be coded for by any arbitrary base sequence.
[But no biologist ever tries to memorize it; there is no point! Unlike learning state capitals in the 5th grade! Which are essential for fulfillment in life!]
VI) All life forms use (almost) the same genetic code, despite its apparent arbitrariness!
VII) Except for GGGGGGG, or CCCCCCC etc. any given base sequence has 6 different alternative "reading frames", depending on where you start and which strand you read.
VIII) Eventually learn the distinctions between "duplication", "transcription" and "translation".
IX) Because of certain methods, it is now MUCH easier to find out the base sequence of DNA than to determine the amino acid sequence of a protein. So nearly always, biologists find out the amino acid sequences based on the DNA base sequence!

SECOND Key concept for today
Learn the essence of the main method used to sequence DNA (the Sanger method).
Dideoxynucleotides poisoning of DNA duplication.
The DNA to be sequenced is mixed with copying enzymes
plus deoxyATP, deoxyGTP, deoxyCTP, and deoxyTTP
(the DNA equivalents of ATP, GTP, CTP etc.)
Separate this mixture into 4 equal amounts, each in a different test tube.
Add a tiny amount of dideoxyATP to one of the 4 tubes,
Add a tiny amount of dideoxyGTP to one of the other 4 tubes,
& tiny amounts of dideoxyCTP and dideoxyTTP to the other 2 tubes
Let the enzymes copy the DNA for a while, so you get lots and lots of copies.
THE KEY FACT is that when a dideoxyATP is added to the DNA that prevents further elongation.
Because there is no place on the ribose to bond more subunits.
For example,
suppose you were copying single stranded DNA whose tenth base happened to be Cytosine.
Then if your mixture of enzymes and ATPs etc, contained only dideoxyGTP instead of deoxyGTP
then your copy would terminate at the 10th base, every time.
So you would get lots of DNA strands 10 bases long.
If your reaction mixture of enzymes and chemicals contained small amounts of dideoxyGTP, along with regular deoxyGTP,
then SOMETIMES your copies would stop at base 10,
and sometimes they would stop at base 15, and base 17, 24 etc.
(IF 15th, 17th, & 24th bases happened to be cytosine)
The method used to measure the lengths is electrophoresis
Imaginary data (electrophoresis gels from the 4 test tubes)

dideA ) )) )) dideT ))) )) dideG ) ) ) dideC )) ) )) ) )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A A A G G A A G T C G G T T C G C G T T
T T T C C T T C A G C C A A G C G C A A
Questions that you should be able to answer:
1) Many mutations consist of substitutions of one base for another in DNA, and many of these cause replacement of one amino acid for another amino acid in the protein that gene codes for. How is it possible for the new base sequence to code for exactly the same amino acid sequence as did the non-mutant base sequence? These are called "silent mutations". HINT: Why would this happen if the genetic code were not degenerate?
2) What specific amino acid is coded for by the base sequence guanine-guanine-guanine?
3) Some base substitution mutations cause considerable shortening of the protein coded for by that gene; how can that happen?
4) Mutations in which a base pair is lost out of the sequence
(for example, if ...ACGTAC... -> ...AGTAC...) result in protein that have wildly different properties, because their base sequences become wildly different from that of the non-mutated gene, for the part of the gene past the mutation. Can you visualize and explain why that happens?
5) Would the result of adding an extra base pair be approximately the same?
6) What about inserting two extra base pairs? What about inserting three extra base pairs?
**7) Certain mutation-causing chemicals (called "frame shift mutagens) tend to cause changes of these kinds, with base pairs either dropping out, or extra pairs getting inserted. They are a partial exception to the rule that all mutagens also cause cancer. Can you see what this implies about whether cancer results mostly from over-activity of proteins, as opposed to under-activity?
*8) If there were only 2 alternative bases in DNA (like, say, just C and G), then how many different amino acids could be coded for by codons three bases long? What about codons four bases long?
What about codons five bases long? Six bases long?
9) Some people have suspected that maybe a simpler version of the genetic code evolved before the one we have now, with this simpler version having codons only two bases long, and coding for fewer than 20 amino acids. If there were one STOP codon in this simpler code, then what is the maximum number of different amino acids it could code for?
**10) One can make educated guesses, based on the actual genetic code, about which particular amino acids would have been evolved first, and which ones not until the code became a triplet code. Hint: Glycine, alanine and serine would all be in the first list, and so would arginine; but not tryptophan or tyrosine! Can you figure out why this might be?
11) What aspect of the genetic code suggests that all living things on earth evolved from one common ancestral species?
12) What would be the effect of a mutation that caused GGG to become the a codon for alanine, instead of for glycine? Hint: Why wouldn't it have any effect on a protein that doesn't normally happen to have any glycines? Another hint: why would it only change some of the glycines, even in proteins that do normally have them?
*13) If you knew the exact amino acid sequence of a given protein, then why could you NOT figure out the base sequence that codes for it?
*14) Suppose that you knew the amino acid sequence of the normal protein, and also knew the amino acid sequences of many mutant versions of the protein, that you believed resulted from single base substitutions, THEN how might you be able to figure out the base sequence of the gene?
**15) What about trying to deduce the base sequence of a gene based on the amino acid sequences of the normal protein, as compared with those resulting from a few frame shift mutations?
16) These days, how do scientists usually find out the amino acid sequence of a given protein?
17) Why do they do this so indirectly, instead of directly?
18) What effect does dideoxyATP have on synthesis of copies of DNA strands? Why?
*19) How is this effect an example of a common principle in drug design, widely used in pharmacology?
20) Suppose that enzymatic duplication of DNA in the presence of dideoxyATP resulted in the formation of lengths of DNA 5 bases long, 10 bases long, 15 bases long, 20 bases long, etc. what could you deduce from this about the base sequence of the DNA being copied?
21) If an exam were to include a sketch of 4 parallel electrophoresis gels, such as are used in the Sanger method for determining DNA base sequences, appearing and labeled like the following

A ) ) ) )
T )) ) ) ) ) )
C ) )
G ) ) etc.

Then could you figure out the base sequence of the DNA?
22) Could you draw the expected appearance of such gels, corresponding to a given base sequence?
23) Based on a sketch of the DNA fragment bands in such a set of gels, and if provided with a diagram of the genetic code, could you figure out the amino acid sequence of the gene being sequenced? Hint: Why could you find 6 alternative amino acid sequences, but not be quite sure which one of the 6 represents the actual protein?
24) Suppose that five out of these six alternatives have lots of stop codons, and only one out of the six goes for a few hundred amino acids, between stop codons! What would that suggest?
25) How could the relative frequency of stop codons be used to decide between alternative amino acid sequences? (And, yes; that is how it is done!)
*26) Suppose that somebody synthesized a chemical analogs of each of the different amino acids, in which the carboxy groups were somehow defective, and unable to form a covalent bond to the amino group of another amino acid, then invent how these could be used to determine the amino acid sequences of proteins, as these were being synthesized? Hint: Why would you also need a method for separating amino acid sequences according to their total number of amino acids, independently of either molecular weight or conformational fitting?
**27) Why is it much more practical to separate DNA (or RNA, but somewhat less so) by length in bases, independent of particular base sequences? (In contrast to the difficulty of separating chains of amino acids by length). (Hint; two reasons: sizes and conformations)
*28) Base sequence of DNA does sometimes have relatively small effects on its 3-dimensional bending patterns, but not as much as in RNA, and not nearly as much as do the amino acid sequences of proteins: how does this make DNA a "better" genetic material than either RNA or protein?
*29) Before DNA was proven to be the genetic material, almost everyone believed that its structure was to simple to serve that purpose: too few different alternative subunits; not enough differences in folding patterns. If you were looking for life forms on other planets, would you follow that same reasoning; or perhaps the reverse? In either case, explain your reasoning?
*30) Even before the exact genetic code had been worked out, experiments showed that plants and animals and bacteria and fungi all used the SAME code (a given codon "meant" the same amino acid in all organisms); can you figure out how it was possible to prove that?
*31) Suppose that you could somehow magically change your own cells so that GGG would "mean" alanine instead of glycine: why would you also have to mutate nearly all of your genes at the same time, in order not to die as a result?
32) IF you COULD change your code, and also mutate all your genes to compensate for the change, then how would the code change protect you from all future virus infections? Hint: would the protection come from the mutations, or from the code change?
*33) Given what you learned earlier about the rules for predicting effects of amino acid changes on folding patterns of proteins, then which changes in the genetic code would be least drastically harmful, as a rule? (Hint: Don't mess around with the codons for proline!)
***34) Look at the actual genetic code, and see if you can see any pattern relating changes in expected conformations to be expected from one-base substitutions. I admit, to do this you would need to know which amino acids have charged side chains, which are least water soluble, etc.
*35) If such a pattern could be demonstrated statistically, would that imply that the code itself underwent very early evolutionary changes, and that the code we have is the result of selection between alternatives? (hint: sure)

back to syllabus