Biology 466 Unsolved Problems Fall 2007

Alternative "Scientific Methods"

 

I) The classical scientific method, dating back to Bacon.
     (This is the same Mr. Bacon that some people claim wrote Shakespeare.)

A) Invent hypotheses.
B) Design experiments that should turn out different ways, depending on whether your theory is correct.
C) Write grant proposals to buy the equipment and chemicals, and pay technicians, Postdoctoral Fellows, and Graduate students needed to do the experiments you have designed.
D)Do the experiments.
E) If the results of the experiments are what your theory predicted, then invent more experiments.
F) If the results of the experiments disprove your theory, then invent another theory.

(Sometimes people prefer to do the experiments over again, until the results confirm the theory.)
[In fact, that is very often how scientists react!]

When people really use this method most is when something goes wrong in the lab: like if there is a funny smell, or a funny noise! "Is it a gas leak? Has the refrigerator gotten too warm?"
Is it this? Is it that? Check this. Check that. Hypothesize and test.

Another occasion when scientist use this method, is when experiments don't confirm some theory that everybody believes is true, and that hardly anybody is willing to consider abandoning.
What went wrong? Did we measure out the right chemical concentrations? Is the pH meter broken?
Etc. etc. etc. In other words, they are less likely to conclude that their main theory is incorrect, than to invent hypotheses about why the experiment has given the wrong answer. Notice the irony.

 

G) The very important concept of "experimental controls": When testing medicines, "placebos" are an important kind of control, to test whether apparent effects of some drug might have been the same if the patient had not taken the drug, but only a "sugar pill". That can often happen for psychological reasons. "Sham operated" animals are often used as surgical equivalents of placebos, in the sense of just cutting an animal open, and then sewing him up again, and letting the wound heal. Surgery itself can sometimes simulate the changes expected from removing a particular tissue.

Many experiments have been misinterpreted because the researcher didn't do the right controls, and it used to be a cliché among PhD students that MDs didn't do enough controls. Unfortunately, the word "control" can be misleading. I think it must come from keeping variables constant, like temperature. One speaks of controlling that variable. Other kinds of control include injecting the protein serum albumin, in experimental tests of the effects of some particular protein. This control experiment answers the question "Would other proteins have worked just as well". Sometimes artificial nucleic acids, or DNA isolated from bacteriophages, are used as controls in experiments intended to test whether DNA of a particular gene will cause a certain specific effect.

A good example of a control that wasn't obvious was when Barbara Danowski (when a student here) treated some cells with microtubule poisons after their cytoskeleton had rearranged in response to tumor promoters. Other scientists had hypothesized that the induced rearrangement of the cytoskeleton might depend on transport of materials along microtubules (for example by dynein). They tested this hypothesis by treating cells with medium that contained microtubule poisons and tumor promoter chemicals; and they discovered that the cytoskeletal rearrangement didn't occur, which seemed to confirm their hypothesis about dynein-like transport. A key part of how Ms. Danowski earned her PhD, and then a professorship, was by testing what will happen if you treat cells with the tumor promoter, let the cytoskeleton get disrupted, and THEN add the microtubule poison. She discovered that the cytoskeleton then goes back to its previous, normal, geometric arrangement it the cells, and that they also become more strongly contractile. This has major implications for the use of microtubule poisons in cancer treatment.

When you make a hypothesis, from which you predict some particular experimental result, and then when you do the experiment, the result matches the one you predicted, only the best scientists will seriously ask themselves: "Yeah, but did that result occur for some completely different reason?"

II) The "Double Helix" method of gaining scientific fame and fortune:

A) Figure out which scientific subjects are over-due for some revolutionary breakthrough.
(Good criteria are accumulation of paradoxes and pseudo-explanations for which there is really not much supporting evidence, and lots of surprising observations that no one expected or can explain, and that have been swept under the rug.)

B) Hang around places like Woods Hole, Cal Tech, Cambridge (either the English one or the Massachusetts one), and listen to conversations among scientific big shots.

C) Borrow other scientists' data, and base a new theory on it.

D) Be lucky, and disparage those who helped you succeed; concentrate attention on yourself.

Although I include this method facetiously, it is the method that we are going to use in this class!
(With the exception of "D", and that we have to read Nature rather than go to Cambridge)

 

III) The "Bioassay" method, (which was very popular from the 1930s until recently).

A) Hypothesize that the phenomenon you are interested in is controlled by a specific unknown chemical substance.

B) Invent a catchy name for this hypothetical substance, such as "serotonin", "auxin", "acrasin" or "florigen". (respectively, the substances that control smooth muscle contraction, shoot elongation, bending and root formation in plants, the chemotactic attraction substance in cellular slime molds, and the substance that stimulates flowering in plants. Note: florigen turned out not to exist, but the chemical nature of all the others were eventually identified, in all cases many years after the substance was named.

C) Devise experimental criteria for distinguishing which extract of tissues produces the strongest effect, of the kind you have predicted is caused by your hypothetical substance. For example, the bioassay for auxin was elongation of oat seedlings; the bioassay for serotonin was strength of contraction of sheets of smooth muscle dissected from animal intestines. Notice that these methods use living materials to measure relative amounts of effects.

Although the word "assay" implies measurement of the amount of substances, the purpose of bioassays is to "home in" on the chemical nature of a substance, so what you need is a method that distinguishes which of two or more extracts produces MORE of the effect, and therefore presumably contains more of the hypothetical substance. Don't forget, this is all done before you know what the chemical is, or if it really exists. Once you have invented your bioassay, then you can centrifuge your extract to subdivide it into pellet and supernatant, and use your bioassay to find out which produces more effect (for example, more elongation of the oat seedlings, stronger contraction of the smooth muscle, attraction of more amoebae, etc.). Centrifugation is one of many methods for systematically subdividing material isolated from cells; other methods include chromatography, electrophoresis, etc.
Fractionation methods plus your bioassay eventually leads you to one particular chemical. (you hope!)

Serotonin is 5 hydroxytryptamine (I think so; I am typing this from memory); auxin is indole acetic acid; acrasin is cyclic AMP; and florigen turned out not to exist, although for many years the evidence for it seemed at least as strong as the evidence for the others. It seems to have been some propagated change, rather than a signal molecule.

D) The key to much of pharmacology is chemical synthesis of structural analogs of biologically-important molecules. They often bind to receptors for serotonin, auxin etc. well enough either to stimulate or to inhibit those substances' normal effects. Billions of dollars worth of anti-depression drugs are synthetic analogs of serotonin; and billions of dollars worth of herbicides and root stimulators are synthetic analogs of auxin.

* Name hypothetical substance; * Invent bioassay; * Use fractionation methods; * Discover which fraction produces the strongest effect in the bioassay. (Eventually test with pure substance); Synthesize chemical analogs.

Then misuse the technology! For several years in the 1960s, more than half of all the auxin analogs synthesized in the world were being dropped on South East Asia, to kill the jungles and the rice crops.
(It must not have occured to them to drop serotonin analogs, or they probably would have.)

E) Bioassay methods work best for phenomena that are controlled by a single specific chemical substances, which acts either as a signal from some cells to other cells (such as hormones), or chemicals that are limiting factors (like vitamins) for particular biological processes. The power and popularity of bioassay methods was reflected by a heavy emphasis on vitamins and hormones in introductory biology textbooks and courses from the 1940s through the 1980s, and even into the present.

I hope students will ask themselves whether the bioassay approach to research has any particular "blind spots". For example, what if a signaling molecule is itself very unstable, or gets quickly broken down by enzymes. The latter is quite common for signal molecules that need to produce a brief effect. Quick destruction by enzymes delayed the chemical identification of acrasin for 30 years, but somehow wasn't that great an obstacle for the identification of acetyl choline ("vagus-stoff"). Another difficulty is when many different chemicals mimic the effect of the true signal molecule being sought; This notoriously obstructed discovery of the chemical basis of Spemann's "embryological induction" mechanism for controlling the location of the neural tube and body axis. Complex negative or positive feedback cycles also tend to be either missed, misinterpreted or over-simplified by bioassays. So do situations in which something like an enzyme increases the activity of a signal molecule, and therefore gets misinterpreted as being itself a signal molecule. Furthermore, if cell-cell signaling happens to be accomplished by any other method than secretion of a chemical by one cell and detection of that chemical by another cell, (which is the working assumption that underlies the design of almost all bioassays) then those bioassays will be blind to any other kind of mechanism.

Prior to invention of the chemiosmotic theory of how mitochondria, chloroplasts and procaryotes make ATP, hundreds of "coupling factors" were discovered, none of which exist. "Florigen", the plant hormone that induces flowering, was the subject of many bioassays, but also turned out not to exist.

 

IV) The "Genetic Screen" method (which is now as heavily used as the bioassay method once was).

A) Choose one of the 30-plus "model organisms": such as T2 bacteriophage, E. coli bacteria, budding yeast, Drosophila melanogaster flies, C. elegans worms, mouse, Arabidopsis (dicot plant), Chlamydomonas (green alga), Zebra fish (just like the ones people keep in home aquaria), Xenopus frogs, etc.

B) E-mail or write to the NSF-supported "stock center" of whichever model organism you have chosen, and ask the stock center to mail you specimens with whatever genotype you want. They will have hundreds or thousands of different mutant lines to choose between. For example, the Arabidopsis stock center is at Ohio State, and provides this important and necessary service to biologists all over the world. Research on each model organism could hardly continue without a stock center, which also provides advice, help, coordination of research, and organization of national and international scientific meetings.

C) Pick some interesting aspect of the organism's life, such as the bending of their flagella, or their photosynthesis, or their embryonic development, or their resistance to drought, or how fast they age.

D) Perhaps treat some of your organisms with mutagens; or maybe just let mutations occur spontaneously.

E) Grow thousands or millions of individual organisms.

F) Invent or borrow a "genetic screen" method, which means a method or criterion for finding just those individual organisms that are abnormal in whatever aspect of their life (like bending of flagella, resistance to drought, response to some hormone, abnormality of embryonic development at a certain stage, etc. etc.) It's best if you can invent some way that the organisms will somehow separate themselves, as a result of mutations that change whatever property you are interested in. For example, if you could create a condition in which all those without the kinds of mutations you are interested in will swim away, or fly away, or maybe just die. That means a lot less work for you, than if you had to look at each individual to see if whether is has been mutated in the particular way you are interested in.

To the extent that your "screen" (method) results in all the non-mutants separating themselves automatically, then you will be able to find much rarer mutations, much more quickly. There are scientists whose careers are based mostly on having invented some particularly cunning and powerful genetic screen; just as there were scientists whose careers depended mostly on having invented a good bioassay.

G) Accumulate as many as possible new mutant lines of the organism. Then use classical genetic crosses, or maybe other methods, to find out how many different genes these mutations affect. Note that it would not be unusual for somebody to isolate two hundred genetic variants, but then (by much hard work!) discover that only 3 or 4 (or 1) genes are involved, in the sense that 74 of their mutations change gene number 1, 51 of their mutations affect gene number 2, and 13 of their genes affect gene number 3, and that no other genes were ever mutated in a way that altered the phenotype of interest, or at least none that the genetic screen managed to filter out of the population.

The working assumption is that you can find mutations for ALL genes which affect the phenomenon in question, and thereby find and identify ALL proteins whose function affects this phenomenon. There is some similarity to strip mining, both in the completeness of what is found and also the amount of hard work needed. When many mutations in the same genes have been found, but no new genes are being found, that is evidence that you have found all the genes that affect the property of interest.

What particular "blindspots" does the genetic screen method of research have?. Their classic blindspot is lethal mutations. If every mutation of a given gene kills the organism, they you just aren't going to find that gene. Notice that such a gene would probably be extremely important for normal functioning; otherwise mutating it wouldn't be so harmful. The extreme importance produces a sort of invisibility to genetic approaches. The existence of genes is (Always! What, "Always"? Yes "Always") detected by the isolation of mutant strains. (And I hope somebody detects my analogy to Gilbert and Sullivan's "Never! What, 'Never'?) Eventually, increased use of DNA base sequencing may circumvent this dependence on viable mutant strains; but so far it mostly hasn't.

One way to get around this problem is to look for temperature sensitive mutations, that only produce their effect above some threshold temperature. Another way is to look specifically for recessive lethal mutations, in which some predictable and evident fraction of each generation dies. That was how hox genes, and all that, were first discovered in flies.

Nevertheless, we should probable anticipate a tendency to miss proteins in which even the tiniest changes in amino acid sequence produce drastic results. As a far-fetched analogy, imagine that there were some special kind of star that burned out the film or other detector on which its image was focused; or imagine some marine animal that ate all the divers and cameras that came within sight of it. Such things would have a special kind of invisibility.

Somewhat the opposite problem is when either of two or more different proteins can (by itself) serve a given function. Then you are going to need double mutants to get certain phenotypes. In several embarrassing cases, so-called "knock out" organisms were created in which marker genes were inserted into genes for important proteins, but there was little or no phenotypic change. This gets explained away as the result of one protein being able to take over the function served by another. That is probably correct, but nobody seems to notice that it would imply some sort of feedback control system (it seems to me), to respond to the deficiency of one protein by amplifying the synthesis or activity of the other protein that is compensating for it. Speaking more generally, feedback control systems are seldom "on the radar screens" of people doing genetic screens. We warn students that it is naïve to attribute each specific property to one specific gene, as if two genes couldn't both affect the same property, and as if one gene couldn't affect several independent properties; But research papers frequently commit this exact sin.

Worst of all deficiencies of the genetic screen method is that, in effect, they put the hypothesis after the experiment. They even claim the virtue of objectivity for doing this. For example, imagine a situation in which a given genetic screen has found three genes in which mutations alter a certain aspect of the phenotype, and one of these genes has a sequence homologous to known transmembrane receptors, and the second gene has a sequence similar to an intracellular kinase, and the third gene's sequence is similar to a known protein that is used as an extracellular signal. The interpretation will be that this aspect of the organism's life works by the signal molecule binding to the receptor protein, which then activates the kinase. And probably that's correct; But in case it isn't, no one will have any way to know. As was true of bioassays, the whole experimental approach presupposes a narrow range of possible answers. In case the true mechanisms work in any fundamentally different way, different from expectations, then the best we can hope for is that the results of the screen won't make sense. More likely, results can be made to fit preconceptions. Results that confirm preconceptions are called "insights". Furthermore, it progressively comes to seem as if this methodological system has proven that all biological phenomenon share a narrow range of mechanisms, because mechanisms can only be detected when their molecular components are recognizably similar to ones already discovered.

Unknown categories of molecular interaction cannot be discovered by recognizing close analogies between proteins. What does it tell you about the mechanism of normal self-tolerance to discover that this mechanism fails when a certain gene is mutated? For example, what if that gene is discovered to code for a certain kind of transcription factor? Or if such a gene is discovered to code for an enzyme that detaches phosphates from intracellular proteins? These facts might themselves tell you something worth knowing about the basic mechanism; or they might be used as foundations for designing actual experiments that would prove or disprove the truth of certain categories of mechanism. Do either of such facts move you an inch toward understanding whether anti-self lymphocytes are normally inactivated or are normally killed, or maybe re-programmed? How can they be made to weigh on the balance of the question whether the normal criterion for killing/inactivating/etc. lymphocytes is that their binding sites fit any antigen that is (one possibility) already present in the early embryo, or (second possibility) if lymphocytes are eliminated when their antigens have a high enough concentration in the body? If you answer "yes" to any of these questions, then please tell me how.

H) Although DNA sequencing is not inherently a necessary part of the genetic screen approach, in practice the "model organisms" have mostly been the first to have their genomes sequenced. Furthermore, those genes that are found by screens are eventually matched up to particular base sequences, and their amino acid sequences thereby deduced.

If, for example, 5 genes are found by a genetic screen to affect some specific biological process, then it is apt to turn out that one gene codes for a cell surface receptor, and two of them code for tyrosine kinases, and one codes for a small GTPase, etc. from which the researcher will then create a PowerPoint presentation in which one protein is red and another green, etc. and the standard plausible "scenario" is then presented as if were itself an observation. In effect, the hypotheses are being made after the data has been accumulated!

In combination, the stages listed above are probably the most powerful "discovery procedure" that has been or will ever be developed for Biology. It substitutes hard work for creativity (which its practitioners equate to bias). I also suspect that it tends to exaggerate the importance of signal molecules, and one-dimensional causal chains. Note that we face what amounts to a meta-theory, in the sense of absolute certainties about how to do science, and what kinds of mechanism can be true. That was also a weakness of the bioassay method of science.

I) Other techniques that are used as part of such research include In situ hybridizations, "Southern blotting", "Northern blotting", "Westerns", "knock outs", RNAi, "Reporter genes", "Two-hybrid crosses", and new methods are being invented all the time.

Genetic screens are the basis of almost all of this, and although a Cell Biology or Genetics course may teach that such and such a conclusion was proven by "knocking out" a certain gene, that was a late stage in a long process that started with a genetic screen, and depended on a stock center.

Another Achilles heel of this approach is that different proteins may be able to substitute for each other. When that happens, even "knock out" experiments may produce little or no result. For example, there is a certain protein that was claimed to be the key signaling molecule that causes formation of the body axis. Then scientists "knocked out" this gene in mice. One might have expected the experiment to fail because of lethal results; in fact, it failed for the opposite reason: The only abnormality produced was that the baby mice's eyes opened a day earlier.

 

V) The Physicists' traditional form of the scientific method (by measuring and fitting equations).

A) Measure lots of variables, accurate to as many decimal places as possible. (for example, Tycho the early Astronomer; or Balmer measuring light wave-lengths)

B) Fit these measurements to some geometrical curve or equation. (for example, Kepler the early astronomer)

C) Invent some general laws of nature, which would generate those curves & equations. (Isaac Newton, in particular, invented this part of the approach!) (Bohr did it for Balmer's data)

D) Find as many different phenomena as possible, that can also be predicted by those same laws of nature.

Newton modeled his laws of nature on Euclid's postulates; and he modeled many of his confirmations on Euclid's theorems. Logically, this was kind of as if he had been drawing conclusions about parallel lines not meeting based on measurements of squares of hypotheses being equal to sums of squares of the other two sides of right triangles. This was revolutionary at the time. Newton didn't just notice that apples fall; he calculated that the curvature of the moon's orbit can be predicted by the same equations that predict the speed of falling apples. Everyone already knew that things fall.

Critics of Newton complained that he provided no mechanism for gravity, in either the sense of some material pulling on objects, nor even any logical reason why this force would need to vary inversely as the square of distance. In fact, he had tried and failed to discover such mechanisms; but he responded to critics with a famous Latin phrase "Hypothesis Non Fingo". The modern physicist Dirac responded to analogous criticisms by writing that it is childish not to be satisfied with accurate equations, and to want intuitively-plausible causation.

There are a series of famous quotes about measurements being the only valid form of knowledge:
Rutherford said "Qualitative is only sloppy quantitative", or something like that;
and Lord Kelvin wrote that "When you cannot measure a thing, then your knowledge is poor and meager..." etc. and the more decimal points the better. All the applied mathematicians with whom I have collaborated (which is a lot!), tell me that what I should do is measure the strengths and widths and osmotic pressures, etc. of cells and tendons and epithelia, to six decimal places. Like maybe, cells exert 0.00132465 dynes of pulling force, and the Young's Modulus of collagen is 0.214365 dynes per percent of elongation, per square millimeter of cross section, or things like that, then they would be able to use engineering equations that already exist to calculate how embryos develop.

They won't believe me when I tell them that cells and other biological structures are not nearly that consistent. Measuring their properties is not like measuring a crystal of some pure chemical. Not just are cells more complicated, they are not very consistent. So the real question becomes, what is it about the mechanisms of development that allow embryos to develop such consistent shapes, despite being made out of materials whose physical properties are much less consistent.

The physicist Eugene Wigner wrote a famous essay about "The Unreasonable Applicability of Mathematics to Physics", by which he means that although much mathematics was invented to apply to physics, even more mathematics was invented from pure abstractions, but then later turned out to be exactly what was needed to make sense of new discoveries in physics, like quantum theory and relativity. He wonders what this can mean. If pure mathematicians had invented other abstractions, could those also have fit later physics. Conversely, have we failed to discover significant masses of truth, for the lack of appropriate mathematical tools to organize it?

The mathematicians I. M. Gelfand and V. Arnol'd have written about "The Unreasonable NON-Applicability of mathematics to Biology.

The Physicist Richard Feynman joked that physicists had "got there first, and took all the problems they could figure out how to solve, and drew a line around them, and labeled them "Physics". Then they took all the problems that were too hard for them, and labeled those "Chemistry" and "Biology".

Please don't worry about all this physics. My point is that scientists have taken different approaches.
Last year, some German physicist-mathematicians working in Berkeley wrote to me to ask for copies of some photographs a collaborator of mine had taken 20 years ago, and of which we published about three. So I found the original negatives for those and lots more related experiments, and mailed them digitized versions. There were lots of interesting details that I had never noticed, and don't know how to interpret. The physicists have written and submitted a paper full of deductions. Depending on your training, you may be able to get much more information out of the same data.

If these mathematicians' manuscript gets accepted, I will tell the class more. The referee's criticisms should be very interesting.

 

VI) Still another kind of scientific method: Invention and Improvements of Techniques.

The physicist who invented phase contrast microscopy earned a Nobel Prize for Medicine and Physiology. Prof. Salmon in this department has made enormous contributions to the improvement of light microscopy, mostly by applying electrical engineering methods to television images. Almost nobody saw this coming as a frontier of progress. As late as 1970, electron microscopy was expected either to continue as the leading edge, or to be replaced by some completely new kind of microscopy.

Cloud chambers, bubble chambers, particle accelerators, including cyclotrons, have all had major effects on physics. They all required combinations of creative imagination and skill in construction.

You, as a student, might want to take special note of any phenomena about which we keep having to guess, and make estimates, or that we assume must exist, but can't directly see. If you invent a machine or other method for measuring or mapping these phenomena (so that we don't have to guess!) then that will be another kind of "scientific method", and a major contribution to research.

 

VII) The motivational and financial "Scientific Method".
(For producing trustworthy science and identifying the best scientists, and other scholars)

A) When you discover something, you write it up, add photographs and drawings, and submit it to a refereed scientific journal, such as Nature, Science, PNAS, the Journal of Cell Biology, or the Journal of Experimental Zoology. Most of them want 3 copies, and/or digital forms of the manuscript.

B) The editor of the journal then chooses two other scientists, who he regards as experts on the subject, and sends them copies of the manuscript (these days he more likely sends them a URL that they can use to produce a copy of the manuscript on their own computer), together with some kind of questionnaire or guidelines for criticism. These referees are anonymous, in the sense that the author of the manuscript is not told who they are, at least not without the referee's permission.

C) Each referee studies the manuscript VERY carefully, a process which often takes several whole days. They write out all their criticisms, suggestions for changes, perhaps some praise, and recommend whether the journal should publish the manuscript, often giving some priority score.

The form sent to referees by Science Magazine used to have a part that said: "This paper will be of major importance and interest to researchers in at least three different fields of science yes__ no___
If yes, then list the three different fields 1_________ 2__________ 3__________." (Gulp!)

There are not many papers about which you can list 3 such fields, and the actual papers published by Science mostly don't always seem to satisfy this criterion; and in fact, they have changed the form.

D) The most competitive journals only publish 5%-10%, or even less, of the manuscripts submitted to them. They also don't actually send most submitted manuscripts to referees. Instead, they have committees that pick out the best fifth, or some such fraction, of the manuscripts submitted. Another strategy is to call up potential referees, or e-mail them, and ask leading questions about whether they would be willing to referee a manuscript that says such and such. If you tell them "That was already discovered years ago!", or "That wouldn't be good enough evidence." or "That guy is nuts, and it's a waste of time to read anything he says!" then the telephone conversation becomes a short version of the refereeing process.

E) Scientists tend to submit their discoveries to the most competitive journal that they think might possibly publish them. Some people submit everything to Nature first, then after Nature rejects it, they submit it to Science, then after Science rejects it, they revise it and submit it to Journal of Cell Biology, etc. Note that it is considered unethical to submit the same research to more than one journal at a time! Don't ever do that. You have to wait until after one rejects you before you can submit the same, or even a revised version, to any other journal. Some journals also have written requirements about secrecy, for the papers they DO accept. Nature and Science both do. If a newspaper were to report something about it before the actual journal issue is published, the journal Editor would be furious, and might then not publish it.

I don't know how many manuscripts I have referred, but it might be a thousand; this includes some for Science and Nature. The best journals mail both referees a Xerox copy of what the other referee wrote about that same paper. These are interesting to read, and also a stimulus to do the best and most responsible job you possibly can. In one case, I wrote a very negative review, and strongly argued that that manuscript should be rejected, at least unless a whole series of improvements were made.
It turned out that the other referee wrote an even more negative review. But the editor had what amounted to political reasons for wanting to publish that manuscript; so she didn't send either one of us copies of the other referee's criticisms, and told both of us that the other referee had been so positive, that it outweighed our criticisms. Neither one of us could believe that, and both of us wrote demanding to see the other's review. The other reviewer then called me up and asked if I had been the other referee, because he recognized my style of writing. We were really annoyed. Journal editors may sometimes be biased, but this was the worst case I know of.

F) If you want to know whether some scientific claim is trustworthy, ask whether it has been published in a refereed journal. For the last century, or a little more, journal publication has been what you might call the "gold standard". The better the journal, the more reliable the conclusions published in it, in general. For anything very complicated, there should be subsequent papers in other refereed journals, that allow longer and more detailed papers.

When newspapers or TV claim that sharks don't get cancer, and that eating cartilage will prevent cancers, or anything like that, the questions to ask are "Was that published in a refereed journal?", and "Which journal?". Conversely, anybody who discovers something important should submit it for refereeing and eventual publications. That's the standard, agreed-upon, not-perfect-but-the-best system for filtering and checking scientific claims. In athletic events, you have referees; and if somebody claims to be able to throw the shot put further than others, then we only take the claim seriously if unbiased referees are appointed, who thereby take responsibility for the accuracy of the measurements of how far that person throws the shot put. The public understands this for athletics; but somehow it hasn't become part of general knowledge that scientific discoveries also have referees.

G) Refereed journal articles are called "The Primary Literature". The term "Secondary Literature" refers to "Review Papers" which you may (or may not) get invited to write by some editor, either a journal editor or an editor of special series of books. The purpose of a review article is to summarize and compare all of the primary literature on some particular subject, probably comparing different scientists' observations and conclusions, and maybe suggesting what further research is most needed. Editors frequently ask you to change parts of review articles, but it is unusual for them to be subjected to outside referees, especially not anonymous ones.

The word "monograph" refers to what amounts to a book-length gigantic review paper. These are also subjected to editing and refereeing. In the earliest stages, anonymous referees may be sent copies of manuscripts and asked for the most frank criticism. If their reports are not too negative, then the author and the publisher will agree on 2 or 3 or maybe a whole set of scientists to write criticisms of every page of the eventual book, and to be paid for their advice (in the range of hundreds of dollars per chapter, and a few thousands of dollars for the whole book. Every time I have agreed to review manuscripts for books, it started out seeming like easy money, but wound up being something like fifty cents an hour.

Even textbooks have outside referees.

H) To get jobs in science, especially to be a professor, publication of important research papers in refereed journals is the main criterion by which people are chosen and get promoted. This system 'kills two birds with one stone', in that the competition for publication not only provides criteria for the reliability of science, it also provides a means for individual people to prove that they are good scientists. Conversely, it motivates people to do the best science they can, and also to put that science into forms that are available to everyone in libraries.

This system has developed relatively recently, first in Germany almost two centuries ago, then in England in the later 1800s, and America from around 1890. In each country, adoption of this system was quickly followed by revolutionary improvements in the quality of universities.

Notice that other professions have very different systems. Lawyers have Bar Exams, although competition for the highest eventual ranks depends very much on being invited to be a student editor on the law school review at your law school, and then on whether you "clerk" for a high ranking judge.

Physicians have systems of tests, to qualify to practice medicine in each country and each state of the US. Each specialty within medicine has systems of "Board Certification", which would typically require completing a 2 (or more) year "Residency" (after the year of being an "Intern"), and then spending two or three entire days at some major medical center, and going around and examining actual patients with several leading specialists in the country. That's why you see all those framed diplomas in Physicians' offices, whereas professors are much more apt to decorate their offices with framed covers of journals.

Economically speaking:
The key Scientific Method is basing competition for jobs on publication in refereed journals
.
Nobody told me that in college.

In Renaissance Italy, a standard method for earning professorships was some kind of public contest or debate. In mathematics, each candidate would submit a sealed list of 50+ problems to be solved, and then all the candidates would have to solve as many of the problems as they could (which included showing that they could solve the problems that they, themselves had proposed!). Although this "contest method" was an unbiased way to find the best mathematicians, and also stimulated new discoveries, it had the undesirable tendency of encouraging mathematicians to keep their equations secret, so that they could use them to win professorships, and to retain those professorships against future contestants. The modern "publish or perish" method has the opposite effect.

Imagine an alternative society in college professors were chosen on the basis of either a TV quiz program, or based on some test like the SATs.; or an alternative society in which judges were selected on the basis of which lawyers won the most cases, or were chosen by lotteries (like we DO choose jury members, and as the Athenians DID choose government officials).

 

back to syllabus

back to index page