Why I like the phenotypic view: Epistasis and the mean field approximation

Before we move on to multilevel selection, I realized that I had not finished an important aspect of the story I was developing.  In particular, I did not step back and ask how all the talk of gene interaction affected the main thesis of this blog, the phenotypic view of evolution.  I tell other that it is always essential to step back and ask why what they are doing is interesting, and think it wise to follow my own advice.  Also, in fact, my thinking about epistasis was one of the important reasons I have come to reject the genic view.  I promise promise promise that I will move on to multilevel selection next week.

There are two ways that the genic view fails. The first is philosophical.  Even if we could reduce selection to changes in gene frequency, that does not mean we should.  That is, because selection acts on the whole phenotype our best understanding will come from a world view that focuses on the phenotype, and not on one that focuses on its part.  An engineer wants to know what works, and does not particularly need to know why it works, but scientists want to know why things work.  If selection acts on phenotypes, then that is where we should focus our theory.  The importance of this will hopefully become more obvious when we talk about multilevel selection.  (Oh, and apologies to engineers.  They do care about why things work, its just that they don’t NEED to know the why to do their job.)

The second is practical.  In short the genic approach doesn’t work.

Here is where a little history helps.  Sir Ronald Fisher is the initial developer of what is now quantitative genetics.  (The Genetical Theory of Natural Selection) When Fisher developed his theory in the early 1900s (the first edition was published in 1930) the nature of genes was not known, and there was an ongoing controversy over whether the major mode of inheritance was particulate or continuous (Provine’s book is an excellent summary of the history of this period ).  The point is that he made a number of simplifying assumptions that were clearly not true, but nevertheless reasonable first approximations.  In particular, he assumed that population sizes were infinitely large, that mating was random, and that traits were determined by an infinite number of loci, each with infinitesimal effect.

It is almost certainly true that Fisher was aware that these simplifying assumptions were at best approximations.  He was well aware of epistasis, and he was aware of the problems of assortative mating.  My best guess (based on talking with Yaneer Bar Yam  a complex systems friend) is that Fisher was aware of the limitations of his assumptions, but when the assumptions of infinite population size etc. were relaxed by later investigators, the consequences were assumed to still hold.  In particular, in an infinitely large random mating population with random mating every gene experiences every genotype in proportion to its frequency.  In addition, Fisher’s model assumes infinitely many loci each with infinitesimal effects, resulting in the counter-intuitive results is that selection does not results in changes in the frequencies of the underlying genes (or more correctly it results in infinitesimal changes in gene frequency). Under these circumstances the average effect of an allele is constant, and unchanging.  Even if there is epistasis it averages out and can be ignored.  Fisher quite explicitly put epistasis into the residual or environmental variance.  Thus, Fisher’s assumption of additivity was perfectly reasonable given his assumption of infinite population size.  The problem is that later modelers relaxed the assumption of large population size etc., but retained the assumption of additivity.

Selection never acts directly on the gene; however, under Fisher’s assumption of infinite population size and random mating – the “mean field approximation” – there is no practical reason why we cannot reduce selection on organisms to the sum of selection on the individual genes.  The problem is that the real world is not like this.  The number of loci is finite, population sizes are finite, and mating and interactions are not random.  As a result epistatic interactions do not average out, and the average effects of alleles change over space and time.

Where this becomes a problem for the genic view is that value of a gene is ephemeral.  It is contextual and depends on a myriad of epistatic and social interactions.  In theory at any given moment it is possible to assign fitnesses to individual loci, and caracature (by that I mean an imitation of a thing in which striking characteristics are exaggerated in order to create grotesque effect) selection as if it were acting on individual genes.   The grotesque nature of this reductionism becomes apparent in the next round of random mating, or when gene frequencies change, or when the social milieu changes.  The reduction to selection coefficients on individual genes can be done again, however the the selection coefficients will be substantially different than they were in the previous generation.  As a result, fitnesses assigned to individual genes change from generation to generation, and they have no predictive power.  One has to ask the question whether a theory with no predictive power is of any use.

The second problem is that while it may in theory be possible to assign fitnesses to individual alleles at individual loci, there are some 25,000 loci in humans, which seems to be about right for most organisms.  Reducing selection coefficients to individual loci requires solving a problem in 25,000 unknowns.  This problem is exacerbated by the fact that the genes are linked, and in linkage disequilibrium.  In short it is what is called an NP hard problem.  Colloquially what that means is that while it is conceptually possible to assign fitnesses to individual genes, a computer large enough to solve the problem cannot even theoretically be made.  PLEASE NOTE:  I have been corrected on this.  NP hard problems can be solvable for finite N, and in fact this problem might be possible for a large enough computer.  The real problem is obtaining enough quality data.  (thanks to Peter for this)

deep thought jpg

Not even Deep Thought can solve the problem of assigning fitnesses to genes.

 

Of course there certainly are examples of single locus diseases, such as sickle cell anemia, where we can assign individual fitnesses to genotypes; however, even here we find that these are not simple single locus diseases.  Sickle cell anemia is particularly severe among the Bantu, but is apparently frequently much milder in Arabia, India and Senegal.  Thus, even the archetypical single locus genetic disease is really polygenic (A. Gabriel 2010.  Nature Education 3:2 ).

The other exaggeration in my thoughts above is that many of those 25,000 loci are fixed, or have a small enough effect that they can be safely ignored.  However, remember that if there are only two alleles per locus the number of possible genotypes goes up as a factor of 2n , where n is the number of loci, a number that goes up very quickly as the number of loci increases.  Somewhere above about 20 loci this starts to become an NP hard problem, and certainly beyond the range of experiments with real organisms.

Now an important point:  I cannot prove this, but selection and drift both appear to have the effect of reducing dimensionality within populations.  Thus in any given population it will often be the case that an additive dominance genic view works just fine.  The problem is a classic complexity issue of a model working for the wrong reasons.  That is selection and drift can reduce complexity to the point that the genic view works WITHIN a population but this reducibility will be heavily context dependent, and not be descriptive of the genes performance outside of the specific population.

The bottom line is that no matter what model we use nature will do what it will do.  Having a genic versus a phenotypic perspective does not change the result of evolution, it changes our perception of it.  My argument is that the reductionist genic view is constraining and simplistic, whereas the phenotypic view is a perspective that can embrace these complexities, and incorporate them into the phenotype to phenotype transition equation as needed.

 

Added after the fact:

OK, this is embarrassing, but I think Peter was wrong in his comment below, but so was I.  There are about 25000 loci in humans give or take.  If each one has two loci (a serious underestimate), then there are three possible genotypes at each locus.  To fully implement the genic view we would need to construct and phenotype all possible genotypes, which would be 3N (not 2N).  When N = 25000 this is a very large number, a number far larger than the number of electrons in the universe (usually estimated to be around 1080).  Because the number is so large it is not even theoretically possible to build a computer that is large enough that it can store, let alone process this information.  This is the essence of an NP hard problem.  How large is 325000 basically infinity.  The best I can find is that 383 = 3,990,838,394,187,339,929,5 34,246,675,572,349,035,227.  My only excuse is that I just wanted to make the point that the number of possible genotypes is a very large number, and I made the statement without thinking it through.

2 Responses to “Why I like the phenotypic view: Epistasis and the mean field approximation”

  1. Peter:

    Thanks. I am a biologist, and rather new to the complexity language. Thanks for setting me straight.

    Actually, Peter on further thought, If there are 2 alleles per locus (there are obviously much more) that means that there are 2^25000 possible combinations. My computers built in calculator returns infinity for that number. 2^64 is 18 quintillion. Now I am not sure how fast deep thought is, but this sounds like it is getting very close to a number that is too big to calculate.

    Phylogenetics runs into this problem a lot when searching trees. They have developed sophisticated search algorithms, and we might well be able to come up with some that would give us a pretty good guess about what a gene “does” averaged across all possible genotypes. However, I am not volunteering to phenotype 18 quintillion organisms. . .

  2. Peter says:

    I suggest that you revise your paragraph about NP-hard problems, these are potentially unsolved problems for n parameters but not necessary unsolvable for some finite n. Heuristics often do a good job, a good example are phylogenetic tree estimation and I guess approximating 25000 parameter with enough data is not particularly difficult on a large computer cluster. Whether there is good enough data to do that is of course the more important question.

Leave a Reply