Population structure and recombination

One of the joys of a genic view is the apparent constancy of things. One of the big ones is that a gene has an effect that can in some sense be considered a constant that can be written down and stored on a piece of paper in a mayonnaise jar on Funk and Wagnall’s front porch (that is a Carnac the Magnificent reference for those less than a million years old). As I have pointed out before, the way we measure the effect of an allele is to use a defined, usually homozygous, background and ask what the “mutant” and “wild type” alleles do to the phenotype (yes, I know the reality is more sophisticated than that, but really not by much!). The point is that the effect of the gene is only constant in the context of the simplified genetic background in which it is measured.

the-tonight-show-starring-johnny-carson-carnac-the-magnificent

CARNAC THE MAGNIFICENT (Johnny Carson): Supercalifragilisticexpialodocious and constant gene action. ED MCMAHON (reading question): Name two phrases that have no meaning. CARNAC: May the fleas of a thousand camels infest your armpits.

I got to thinking about this and realized that there are other genetic parameters that we take as constants, that are actually functions of the context in which they are measured. The one I want to talk about today is recombination rate. Typically recombination rate is simply the map distance between two genes measured by the frequency of crossovers in a dihybrid cross.

linkgene-141C32E644C438FE4E4

A dihybrid cross tells us the recombination rate for a pair of loci. In this case the loci are linked with a recombination rate of 0.17 (From https://www.studyblue.com/notes/note/n/lecture-exam-2/deck/8120019)

This is all well and good, until you get to population genetics textbooks. They will then tell you about linkage disequilibrium (or gametic disequilibrium if you prefer). First a few fun asides. Linkage disequilibrium is actually the covariance between the allele state of two loci. To see this imagine we have an A locus with alleles A1 (value 1) and A2 (value 0), and a B locus with B1 (value 1) and B2 (value 0). The four haplotypes have frequencies of p11 (A1B1), p12 (A1B2), p21 (A2B1), and p22 (A2B2). Then the covariance is:

Equation 1

Second fun aside: D is the determinant a matrix of the frequencies of the gamete types in a population.

Equation 2

What that all means I am not sure. That is other than demonstrating the obvious fact that all of nature is embodied in covariances and linear algebra.

At this point your population genetics textbook will go on to tell you about the decay of linkage disequilibrium based on the recombination rate. It doesn’t take a lot of algebra to show that linkage disequilibrium decays as a function of the recombination rate. For example:

Equation 3

where the prime mark indicates the next generation and

Equation 4

At first blush this is beautiful. It demonstrates that simply by knowing the map distance between two loci we know the recombination rate, and with it the rate of decay of linkage disequilibrium. In other words, we assume that this classic measure of association is a property solely of the genome, and thus only the genome is responsible for the behavior of the genes.

Sadly, as is so often the case with the genic view, there is a hidden assumption that has been hidden for so long that it is even lost from our intuition. This hidden assumption, of course, is the assumption that the population is unstructured. The important point to realize is that the only time that recombination has any effect is in the double heterozygote. If either locus is homozygous then a crossover event produces exactly the same gametes as are produced in the absence of crossing over. The problem comes that population structure tends to reduce the frequency of heterozygotes. As a consequence, in a classically inbred population the rate of decay for linkage disequilibrium is affected by a factor of (1-f)2, where f is Wright’s the inbreeding coefficient. The easy way to think about it is (1-f) is the probability that two alleles are not correlated, usually because they are identical by descent (IBD). If the alleles in an individual are IBD that individual is by definition homozygous at that locus and crossing over will have no effect. The quantity is squared because homozygosity at either locus negates the effects of crossing over. Thus we can re-write the above equations as:

equation 5

and

equation 6

In other words we can think of the “effective” recombination rate as r(1-f)2.

This seemingly trivial point is actually quite important. It emphasizes that population genetic parameters are a function both of the genes and the population in which they are measured. Even something as seemingly constant as recombination rate can be changed simply by changing the population structure, and the degree of mating among relatives.

It also has some fun adaptive story telling implications. There is a battle between sex as a source of variation, and sex as breaking up well adapted genotypes. This little exercise suggests that there is a middle ground: sex with relatives. Population structure limits the field of recombination, and has the effect of reducing the recombination rate among loci. One can imagine population structure evolving as a means of preserving local adaptations and reducing the effective recombination rate. Of course this would come at a cost of decreased heterozygosity, so perhaps that would be a different battle.

Finally, I should mention that in many mammals fIS is often negative. This should have the effect of increasing the recombination rate. I leave it to you to make up adaptive stories for that one. . .

 

Leave a Reply