September 7: Nucleic Acids

 

sugars ribose, glucose (pentoses, hexoses)

polysaccharides cellulose, starch, etc.

A "deoxy" sugar is one that is missing one of its OH groups.

Deoxyribose has just an H in one of the places where regular ribose would have an -OH

purines and pyrimidines:
Please look at their structures in the textbook, and learn to recognize them, although you don't yet need to know them well enough to draw yourself.

adenine and guanine are the two important purines in genes
cytosine, uracil & methyl-uracil = thymine are the pyrimidines used in genes

C-G, G-C A-T, T-A A-U U-A

"Urine" is uracil with a ribose & phosphate attached
"Thymidine" is thymine with a ribose & phosphate attached
These are "nucleotides", which are pyrimidines or purines with a ribose & phosphate attached

nucleic acids: DNA, RNA

    
              Guanine          Adenine          Cytosine
    Phosphate-Ribose-Phosphate-Ribose-Phosphate-Ribose-etc.

    G A C U A G G G C G p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-

The diagram above is meant to be single-stranded RNA
R is for ribose, which are 5-carbon sugars...
P is for phosphates, which are almost like minerals...
G is for guanine, named after bird droppings (guano)...

    
    
    p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-
    
      C   T   G   A   T   C   C   C   G   C     
       
      G   A   C   T   A   G   G   G   C   G
    
    p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-R-p-

This is a diagram of double-stranded DNA

Guanine pairs with Cytosine
Cytosine pairs with Guanine

Adenine pairs with Thymine (and with Uracil; because thymine is methyl-uracil)
Thymine pairs with Adenine

On at least some of the exams, you will be given an arbitrary sequence of these bases, like AATCGGCCAATTCG and asked to write the complementary sequence.

In this case, TTAGCCGGTTAAGC would be counted as the correct answer.
Although if it were listed in the opposite direction it would be CGAATTGGCCGATT

DNA copies itself by the 2 strands separating, and making new strands complementary to each of the original two.

Any sequence of 3 bases, like GGG or GAT, is called a "codon"

How many alternative different codons are there?

4 times 4 times 4 = 64 different alternatives

If codons were only 2 bases long, then there would be 16 different ones.
If there were 6 different bases, then there would be 36 different two-base codons.
How many different 3 base codons would there be if there were 6 bases?

With 3-base codons, and 4 different bases, you can code for a maximum of 64 different amino acids.

But many codons "mean" the same as each other.
GGG, GGC, GGA, GGT all mean "glycine"

But only TGG means tryptophan; and only ATG means methionine
serine and 2 other amino acids have 6 codons each.

And a few of the codons mean "stop".
Like a period at the end of a sentence; they mean "Don't add any more amino acids to this protein, this is the end!"

TAA, TAG, and TGA are the 3 stop codons in most cells.
(Not to be confused with TGIF, which is the stop codon for humans. A small joke!)

Questions that you should now be able to answer:

1) What if you have a chain of carbon atoms, all or most of which have -O-H groups attached?
What kind of chemical would that be classified as?

2) Ribose and glucose are specific examples of which kind of chemicals?

3) Can these kind of chemicals be bonded together in long chains, with many in the chain?

4) What are these called, and what a two specific examples?

5) What would be meant by deoxy-glucose?

*6) What would you guess would be meant by dideoxyribose?
HINT: What about trideoxyribose? Tetradeoxyribose?

7) Why would tetradeoxyribose be more of an alcohol than a sugar?

*8) Dideoxyribose is poisonous: Figure out why?

9) Are there any sugars in nucleic acids?

Are they directly attached to each other, like in polysaccharides? Hint: not directly.

10) These sugars in nucleic acids are linked together by means of what chemical groups?

11) Where are the purines and pyrimidines located, relative to this backbone of sugar groups, in nucleic acids.

12) In a double-stranded nucleic acid, purines in one chain bind to specific (what?) in the other chain.

13) Guanine binds to what ? G binds to what? (base pairing)

A binds to what? T binds to what? C binds to what?

14) Notice that a purine (A or G) always pairs with a pyrimidine (T, U or C).

15) Can you figure out why purines don't pair with other purines, nor pyrimidines with other pyrimidines, in double-stranded DNA? (Hint: look at the sizes of their chemical structures)

16) The ???? sequence of DNA and RNA code for the ??? sequence of proteins?

17) What does it mean to say that GGG codes for glycine?

18) If the Morse Code used only 3-symbol elements, then what would be the maximum number of different letters it could code for?

**19) Using just dots and dashes, and with all letters coded for by the same total number of dots and dashes, then how long would each sequence of dots and dashes need to be to code for all 26 letters?

**20) If the Morse Code had 3 different symbols, dots, dashes and bells, then what would the total number of different 3-symbol units (how many different letters could you code for?

**21) In the actual Morse Code, some letters have 3-symbol codes, others have 2 symbol codes, and 2 letters are symbolized by just a dot or a dash. Therefore, besides the dot and dash, isn't there really a third symbol in this code. What is this third signal? hint: how can you tell the boundaries between the code for different letters?

22) Suppose that all letters were coded for by signals of the same length (same number of dots plus dashes for all letters of the alphabet), then would you need a third signal to mark the boundaries between one letter and the next? Hint: no; but do you see why?

23) If you make an analogy between the Morse Code and the genetic code, then what corresponds to the dots and dashes? What corresponds to the 26 letters of the alphabet? What corresponds to the Adenosine etc. ,T,C & G bases in DNA and RNA? What corresponds to the 20 amino acids?

What I want you to be able to do here is to fit all these into sentences like this: " The _______ in the genetic code correspond to the ________ of the Morse Code." and also understand these relationships.

24) Imagine an equivalent to the Morse code, in which letters are represented by sequences of dots and dashes, with each of the 26 letters being represented by a 4-signal sequence of dots and/or dashes, and with no spaces to symbolize the boundary between one letter and the next, then how much would the message be changed if one of the dots was misunderstood as a dash, or the reverse?

25) Why would such a mistake in Morse code be analogous to the mutation that causes sickle cell anemia?

*26) In question #22, why would it have a much worse effect on the message to leave out a dot or a dash, or to put in an extra dot or dash? (worse than mistaking a dot for a dash, I mean)

27) In actual DNA genes, what would you guess is meant by a frame shift mutation? Hint: these result in almost all the amino acids being "wrong" for long stretches of proteins, maybe even hundreds of amino acids. Usually, it also result in ending the protein at a much shorter length than usual.

28) In such cases, would you expect it to be the N terminal end (= the amino terminal end) which is missing, or for it to be the C terminal end (=the carboxy terminal end)? Or could it be either? Hint: no it couldn't.

*29) Sometimes two different frame shift mutations will almost cancel each other out, so that a virus that has both of these mutations will not be much more abnormal than a virus that has neither; can you figure out what is going on then? In other cases, having a combination of 3 different frame shift mutations results in much less abnormality than any one of them, or any combination of two.

*30) Results of that kind were the first direct evidence that the amino acids were coded for by combinations of 3 bases, rather than of 2 or of 4. (i.e. codons are 3 bases long) So the question is, can you figure out what the experimental results would have been if codons were 4 bases long? Or if they were 2 bases long? THEN how many different frame-shift mutations would need to be combined, near each other, to cancel each other out?

**31) Certain chemicals tend to cause this particular kind of mutation. But although nearly all chemicals that can cause mutations will also cause cancer, these frame-shift mutagens are somewhat an exception to this rule; they cause cancer only rarely. So the question is, what does that suggest about the nature of the changes in proteins that cause cancer? Under-active proteins? Over-active proteins? What? Or can one draw any conclusions, not even vague ones?

*32) Could it early life forms used 2-base codons, but fewer than 20 amino acids? If so, by looking at the genetic code, can you figure out which amino acids were probably used in these early life forms? Hint, why would glycine, alanine and serine have probably been among the original ones, but tryptophan was less likely to have been?

***33) George Gamow the physicist also participated in the early development of molecular genetics. One of his ideas was that the majority of codons might be nonsense, with only a minority coding for amino acids, in such a way that there would be only reading frame (of the 3 alternatives). Try to figure out what is the maximum number of amino acids that could be coded for by such a code. Hint: AAA, CCC, GGG etc. would all have to be non-sense, because otherwise in AAAAAA, you couldn't tell where the boundaries are between codons. So 64 - 4 = 60, but then what fraction of the remaining 60 could be actual codons??

I know the answer to this one; but I have forgotten how to figure it out, logically. If some student will figure it out, and then explain it to me, I will be tremendously impressed. No kidding. Historically, the answer is quite funny, and misled a lot of people about what to expect the genetic code to be like.

**34) Can there be an evolutionary advantage of one set of codon meanings, relative to others? For example, if GAC meant glycine, and CAG meant alanine, etc.?

Imagining that it might be better if single base changes tend to produce the smallest changes in folding patterns of proteins, then would THAT make one code better than another, in terms of optimal choices of relations between codons and their amino acids? hint: yes; but figure out what it might be.

*35) Mathematical comparison of the actual genetic code with all alternative codes has been claimed to show that the actual code is optimized for minimum changes in protein folding. Assuming that this is correct, then what might this imply about whether other choices of codes were also tried during evolution? (Hint: of course it does, but what?)

***36) How many alternative codes are there, for 64 different codons, and what amounts to 21 things being coded for (20 amino acids and stop).
hint: it some astronomical number, but is it really 64-prime? (64*63*62*61*60 etc. which is what the answer would be if 64 codons were coding for 64 different amino acids)
Given that many of the codons are synonyms (people say the genetic code is "degenerate", but that was before cable TV, that gave us new ideas of what "degenerate" might mean)

37) Compare the chemical structure of sugars to alcohols.

38) Are their chemical properties similar, or different?

39) How come sugars are solids, instead of liquids?

40) Imagine a 3-carbon analog to sugar; What would its structure be? Why might it be a gooey liquid?

41) Compare hexoses to pentoses?

42) Why are there really a lot of isomers to glucose, & to ribose?

43) Is it true that wood is mostly made out of sugar? yes

44) Why doesn't wood taste sweet? If you were a termite, would wood taste sweet?

*45) If "Hepto" is the Greek root for 7, then what's a heptose? Would you guess that there really be such a thing? sure! Why might some heptoses be slightly poisonous to the kidneys, or at least make more glucose appear in human urine?

46) What are the names of two biologically-important purines?

47) Using the following words: Phosphate, Sugar, Purine, Pyrimidine
link them together with lines to indicate chemical bonds between them, draw the structure of DNA and RNA: First single stranded, and then double stranded

48) If all living things (plants as well as animals, etc.) all use the same genetic code (GGG means glycine for all of them)
does that tell you anything about whether we all evolved from the same original life form?

49) If we found life on another planet, that evolved separately, would there be much chance that it would even use DNA as its genetic material, and make proteins out of the same 20 amino acids

50) In the 1960s, several labs took DNA from plants (say) and put it in animal cells (and the reverse) to see if they would use it to make proteins having the same amino acid sequences.
So what did that prove about whether the same genetic code was being used by both?

51) What would have happened in these experiments if plants used different genetic codes from animals, etc.?

52) What would happen if a mistake was made in copying DNA, and a G got put where an A should be?

53) Would that always change the amino acid sequence of the protein that gene codes for? hint GGG=GGA etc.

54) What if AAA (= lysine) got mutated to TAA (= stop). Or suppose the reverse happened!
(Hint: would the protein made be the normal length?)

**55) Another kind of mutation (that does occur) is for a base to get left out, or for an extra base to be added.
** Why does either change usually make the protein become much shorter than usual?

**56) Remember what is meant by "complementary" base sequences!
GGGAA.. is complementary to CCCTT..
so they can't both code for the same amino acid sequence! So which of the following might be true?

a) In some parts of the DNA, one strand codes for a protein, and the other strand isn't used, except for copying; and in other parts of the DNA, it's the other strand that codes for the protein. (HINT: this is the right answer!)

b) For every protein, there is always another protein being made which is coded for by the complementary base sequence?
(NOT!)

c) One of the strands of DNA codes for all the proteins, and the other strand is codes for none of them?

d) Can you invent some other logical possibilities?

 

 

 

back to syllabus