Variance in a structured world

So far I have been writing about things that in some sense I fancy I know something about the answer. Today, all I have is conundrum. The conundrum I have is this: one of the unspoken themes of this blog is that the “mean field approximation” is inadequate, and yet I am failing to provide an alternative.   Sadly, this is the comment from my physicist friend for which I had no answer.

Consider one of the most basic mean field statistics, variance. Variance is fundamental to our understanding of selection. For example, the selection vector, S, is the covariance between a trait and relative fitness, which in turn is a product of the standard deviations of the trait, relative fitness, and the correlation between the two:

Variance eq 5

Now, lets think about what this means. As far as selection is concerned it means that we are effectively lining everybody up, and choosing those with the most favorable value of the trait, and discarding those that don’t measure up.   In effect this is very much like the start of a race: Everybody lines up at the start line, the gun goes off, and the runners can be ranked based on the time it takes them to cross the finish line.

Race start

The start of a race is a situation in which the variance in a trait, such as muscle mass, can be used with some confidence to predict the outcome the race. The important point is that all are competing equally at the same time and under the same conditions. (http://www.dailymail.co.uk/travel/article-2176034/London-2012-Olympics-Things-London-Olympics.html)

Unfortunately, most of the world is not like this. Even in the world of sports it becomes more complicated. Consider the NCAA basketball tournament. In this contest only pairwise contests are possible. Thus, a team is never competing against the entire field, but is instead competing against a single competitor. This means that for any given contest the performance of teams not in the contest are (at least for the moment) irrelevant. Thus, a measure of the variance in some trait, say team mean free-throw percentage, is irrelevant, whereas the difference in the means of the trait between the two teams is highly relevant. This has some interesting consequences. For example, it is not unusual for a low ranked team to put everything it has into the opening game and defeat a top ranked team only to get trounced in the next game because they don’t have the depth to continue at that level of play.

ncaa-march-madness-results-2014

The results of the 2014 NCAA basketball tournament. Note that the championship was played between 7th ranked Connecticut and 8th ranked Kentucky. Also there are a number of interesting upsets, such as 3rd ranked Duke being defeated by 14th ranked Mercer. These anomylies indicate that summary statistics, such as variance, are not always predictive of the final results. (http://www.printyourbrackets.com/ncaa-march-madness-results-2014.html)

Turning to nature, it is the same thing. As far as our favorite gazelle is concerned it doesn’t matter how fast cheetahs run on average, or how fast the fastest cheetah runs. What matters is how fast the cheetah that is chasing it can run. In a large panmictic population with random interactions mean field summary statistics such as variance are indeed appropriate for predicting the response to selection it is very much like the race example I started with, and we are justified in calculating S as the covariance between relative fitness and the trait value. But what do we do when selection is taking the form of a tournament or interactions are local?

The easy solution that I have used is to assume that the population is structured using an island model of migration. This is a metapopulation in which each subpopulation has random mating and random interactions. In addition, migration among subpopulations is random, with no effect of distance on probability of migration. This is in effect a two level mean field approximation in which we can use variances to describe selection among individuals within subpopulations, and another set of variances to describe selection among subpopulations within the metapopulation. This is fine, and probably often a good approximation, but it is at least conceptually unsatisfying in continuous populations with localized interactions.

Another solution is to use the recently very popular network approach. I am honestly not sure how this works, so I will leave it to others rather than embarrassing myself. That said, I have concerns about this approach in the practical world of measuring plants and animals in natural populations where measuring connections may be difficult or impossible.

So, what is the answer here? The simple truth is that I don’t have one, but I do have some ideas. My thought is that we do something along the lines of a weighted mean and variance, and that we weight the variance by the probability that the interaction will occur. For example, if we have a continuous population the standard manner for calculating the variance in a particular trait in the population would be:

Variance eq 4

Two things to note: (1) yup, I am using the MLE formulation of variance, not the BLUE (Best Linear Unbiased Estimator) one. It may be biased, but from a theoretical perspective it is cleaner. And (2) pi can be thought of as the frequency of the ith type. My thought is that the pi can be any number, as long as it sums to one. Thus, we can replace pi with another value, say qi as long as it also sums to one. I suggest that we define a number, say kji = the probability that individual j interacts with individual i, and:

Variance eq 3

Thus, for each individual we would get a separate “local” mean and variance:

Variance eq 1

and:

Variance eq 2

I will admit at this point I am stuck and running out of space, however, my thought is that we could similarly calculate a local variance for relative fitness and a local covariance between local relative fitness and local phenotype. Summing across individuals (S Zj) may well give a more meaningful estimate of the selection differential in the population. Estimating kji might be difficult, but perhaps reasonable estimates could be obtained using home range distributions, or other behavioral measures.

Actually, there are people who are much more adept at such things than I am so I am sure there is a better solution, but I just don’t know what it is.

 

 

One Response to “Variance in a structured world”

  1. I really like your blog! Just wantet to write something.

Leave a Reply