This week I want to finish up talking about epistasis so that we can move on to multilevel selection. As you might imagine I can talk about gene interaction all day. After all we have not yet talked about long-term selection – epistasis figures big in that, nor have we talked about cyto-nuclear epistasis. But, hey, I am getting bored with this line of talk, and I suspect you are too. . .
So, back to today’s topic. One type of data that a lot of people gather is QTL data on any number of traits. One thing that can result from this is that you might end up with a pair of interacting loci. An old example I picked up a few years ago are from a teosinte by maize cross done by Doebeli and colleagues (Doebley, J., A. Stec and C. Gustus 1995. Genetics 141: 333-346). Aside: As a starting graduate student my office was next door to George Beadle’s office. He was quite old by that time, but still a charming man. The main thing I learned from him was just how interesting corn was, and that despite what Paul Mangelsdorf might say, teosinte WAS the progenitor of corn. As a result the Doebeli work made me very happy!.
http://nrm101-summer2010.community.uaf.edu/2010/07/12/a-history-of-corn/
Doebeli identified a pair of interacting loci (BV302 and UMC 107 if you care). He put these QTL into a teosinte background and measured the “percent of cupules lacking a spikelet”. Don’t fool yourself, it was percent corn like kernels. Because it was crossed into a teosinte background even under the best of circumstances the kernels mostly looked like teosinte. In any case the values of the nine genotypes are:
Now what we would like to do is calculate the additive genetic variance, the dominance variance, and all the epistatic components of variance. These will, of course, change as the gene frequencies change. As a result we need to use a statistical method to calculate the variances. It actually turns out to be surprisingly easy to do this. My favorite program for this is JMP, although I have to say SAS has gotten more corporate over the years, and I suspect someday I will quit getting free licenses from my university and switch over to R, which is way more powerful. I will show you the method using JMP, but it really shouldn’t take too much to translate it into any language you might be interested in.
To calculate the variance components you first need to make a table that lists the genotypes, and for each genotype its frequency and its genotypic value. Finally, you need to list the eight independent genetic variance components. The independent variance components should be weighted so that the variance of each contrast is equal to one (use maximum likelihood weighting N, not the BLUE N-1 weighting or the trick won’t work). You will then want to do a linear regression weighted by the genotype frequencies of the actual genotypic values on the theoretical values
Note that the order is important since for this to work correctly you need to use type one, or sequential sums of squares. SAS and JMP insist on type 3, and they WILL give you the wrong answer. However, with a little digging you can ask for sequential sums of squares. So, in JMP your data table should look like this:
Note that there are a lot of hidden columns here. If you have JMP I am happy to send you a working file of this. Just shoot me an email.
Your model statement should look like this:
Now, here is the beauty of this little trick. When you have done the multiple regression as I outlined it, being careful to enter the independent contrasts in the correct order, making sure they have a variance of one, and using sequential type one sums of squares, you can go to the ANOVA summary and simply read off the variance components. That is the variance due to regression due to the Additive_A contrast is the additive genetic variance due to the A locus etc. Thus, in our teosinte maize cross at a gene frequency of 0.5 for both loci the Anova table looks like this:
and the variance components can simply be read off of the table. For example the additive genetic variance due to the A locus is 0.55125. Importantly, if the gene frequencies change so do the variance components.
Corn alleles = 0.25 Corn alleles = 0.75
The interesting point is that there is a shifting of the variance components as gene frequencies change. Importantly in a teosinte type genotype there is almost no additive genetic variance. As corn like characteristics are selected the additive genetic variance blossoms. This leaves one with the interesting speculation that as corn was selected the process accelerated as the shifts in variance components enhanced the response to selection. It is no wonder that it was so hard to find the progenitor of corn. Teosinte does not respond to selection very well, and yet corn apparently evolved very rapidly. Perhaps now we know why.
Calculating the local average effects is done by calculating the weighted average of replacing a randomly chosen allele in a randomly chosen individual with the allele of interest. For example, if we wanted the local average effect of the A allele, some of the changes we would get include:
Genotype before substitution | genotype after substitution |
AABB | AABB |
AaBB | ½ AaBB, ½ AABB |
aaBB | AaBB |
The local average effect is then simply taken as the average of the difference in phenotypes before and after the substitution. If we do this for the BV302 allele in the teosinte corn cross we get the following figure:
Note that local average effects correct for the population mean, and are always a weighted deviation from zero; however be aware that the population mean is becoming much more “corn” like, so that it is actually the teosinte allele that is relatively flat and the Maize allele that is sailing up into the heights of corn-ness. Despite the perspective issue, it can still be seen why there is no additive genetic variance in the teosinte background. The alleles that make corn corn are nearly neutral in the teosinte background, and do not have a major effect on the phenotype until the corn suite of genes (or at least UMC107) become more common.
I am rather over-interpreting the data since it is only a single pair of loci being examined rather than the whole genome, nevertheless, this provides some interesting speculation that many of the loci that originally gave rise to corn was originally nearly neutral variation in the ancestral teosinte, and it was the actual process of domestication that released this variation that was locked up in epistatic combinations. I am left wondering just how much genetic variation is hidden in loci that are nearly neutral in one genetic background, but become decidedly non-neutral as a population responds to selection
As with the JMP program, if anybody is interested in am excel spread sheet for calculating local average effects and local breeding values for two locus systems shoot me an email and I will be happy to send you one that I have.