Of um, drift, and M&Ms

Obviously there is much more to talk about concerning selection, and also a considerable amount of excitement about it.  I got an email and a facebook comment that are worth mentioning here.   One is that Michael Morrissy, a research fellow at the University of St. Andrews Scotland sent me a manuscript on using path analysis in selection studies.  It was quite good, but not yet published.  Keep an eye out for it.  The facebook comment was a suggestion that I consider turning this into a book.  The answer to that is yes,  I am thinking about it.  We will see if any publishers are interested.  That said, I want to move on from selection to drift, and come back to specific topics of selection at a later time.

Most are familiar with genetic drift, which is change in gene frequency due to random sampling.  I find that when I teach genetic drift it is one of the hardest concepts to get across, so I will spend a few minutes on it.  For starters, lets consider M&Ms.  On one extreme, we have the case of a friend of mine, who was very fond of chocolate, and well liked by the staff where he was department chair.  When he left to move institutions the secretaries gave him his weight in M&Ms as a parting gift.  He was not a small man, and assuming he weighed about 200 lbs that is about 41000 M&Ms (http://eeunix.ee.usm.maine.edu/~white/mmissue/hardevidence.html).  We can guess that if you figured out the proportions of the colors in that gift it would have been very close to 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue.  These are the proportions that the M&Ms people aim for in their bags (http://dealnews.com/features/The-Color-Mixture-in-an-M-Ms-Bag-Is-a-Precise-Science-and-Other-Candy-Facts/626727.html).   On the other hand the “fun size” M&M bags given out at Halloween have, on average 18.36 M&Ms.  Because of their smaller size we expect a great deal more variability in the proportions of each color.  Looking only at the proportion of yellow M&Ms (supposedly 20% of the mix) we find that the median is 3, and the mean is 2.96 (N=25).  Thus, the average proportion yellow in the sample is 16.12%, not too far off from the target.  That said, there is substantial variation among samples, and there is actually a 4% chance that you will not get any yellow M&Ms at all.

Proportion yellow M&Ms

(from http://www.statcrunch.com/5.0/viewreport.php?reportid=9365)

 

The point of this little exercise is that when sample sizes are very large the actual proportions are very similar to the expected proportions.  On the other hand, when sample sizes are very small the actual proportions can deviate substantially from the expected proportions.

Genes are much the same way.  Since we are talking about random sampling with no selection, it really doesn’t matter how we group the genes, that is, the standard grouping of two genes per locus in each individual is nice but not important to us.  Instead all we need to know is the number of genes in the population.  For diploid organisms that will be 2N, or twice the size of the population.  If 2N is very large, then sampling will have very little effect on gene frequencies, on the other hand, if 2N is small then random sampling may have a huge effect on gene frequencies.  Thus, we expect gene frequencies to take a “random walk”, that is the frequency should change randomly from one generation to the next, with the average size of the change being inversely proportional to the population size:

Allele-frequency

“Simulation of genetic drift of 20 unlinked alleles in populations of 10 (top) and 100 (bottom). Drift to fixation is more rapid in the smaller population.” http://en.wikipedia.org/wiki/File:Allele-frequency.png

 

Much has been written about genetic drift, so, at least for the moment, I will leave the discussion at this.  More importantly, we need to talk about how to move from the concept of genetic drift as the effect of random sampling of discrete particles to a more general concept of random change due to small population size that covers both change in numbers of particles, and random change in continuous traits.

The first question to ask is whether or not there are continuous traits that might be subject to random change due to changes in population size.  I can think of, um, one obvious example in language, and that is the use of discourse particles.  Discourse particles are words like “um”, “er”, “uh” (or perhaps “argh” if you are a pirate, and om if you are a yogi) that we unconsciously use in conversation.  These days lecturers make a concerted effort to avoid them but they are an essential part of conversational language, and have been for a very long time (Erard, M. 2007. Um . . . Slips, stumbles and verbal blunders, and what they mean. New York, Pantheon Books).  If you speak Spanish you probably say “este” or “pues”, if Japanese then you probably say “eto”.  Basically these are filler words with effectively no meaning, and words we are often unaware of saying.  As such they are perfect examples of continuous parts of our phenotype that are subject to random change, and rarely if ever under selection.

I am not much of a probability theorist, so I will only give an outline of what I am thinking about how I would model it.  My thought is that for any continuous trait, such as words like “um” each person (at a particular point in their life) can be thought of as having a fixed value of the continuous random variable that is their value of the trait.  Other developing individuals will pick up these variables as part of their patterning node, but of course since THERE IS NO SUCH THING AS A MEME (sorry for the sudden outburst of caps lock syndrome), the value of the trait that they pick up will be some random variable centered around some weighted mean of the “parents” they copy.  Clearly in a very large population every individual would have a random value of the trait that was slightly different from that of others, but the mean of the trait would change very little between generations.  On the other hand, in small populations this imperfect transmission, and the loss of older forms as individuals die will potentially lead to substantial shifts in the distribution of the continuous trait.  This may well be the reason that we no longer “hem and haw” as we did in the past, but instead “um” our way through the day.  Meanwhile, who knows what happened to “er”.

There are two things that are obviously missing from this essay.  The first is that I really had trouble coming up with a continuously inherited trait in non-social organisms.  Does anybody know of any?  Is my ignorance simply that, ignorance, or is it that they don’t exist, or that nobody ever thinks about them?  The second it is very possible that the drift process for a continuous trait has been worked out, but I simply don’t know what it is.  Perhaps some modification of the diffusion approximation will work just as well, perhaps better, for continuously inherited traits, I would love to know if anybody has any insights.

Leave a Reply