A week late and a dollar short, but lets continue comparing matrices. Continuing on with my blatant endorsement of statistical methods attached to my name. . . Last time I talked about the “Rank”/“Signed Bartlett”/”Modified Mantel” tests for comparing the dimension size and shape of a pair of matrices. This is only one of several ways of comparing matrices. This set of tests has the advantage that it is basically non-parametric, and makes very few assumptions about the actual matrices. It is also useful because it directly compares matrices for easily interpretable differences. The problem with the these tests is that in most cases we don’t so much care about whether or not a pair of matrices are the same or different as whether they have the same or different effects on the evolution of the organism.
Obviously the size shape and dimension of a covariance matrix will be related to the ability to respond to selection, but the relationship may not be perfect. Two other approachs that has been developed are “random skewers” (Cheverud 1996 J. Evol. Biol. 9:5-42; Cheverud and Marroig 2007 Genet. Mol. Biol. 30:461-469; Revell 2007 Evolution 61:1857-1872) and “selection skewers” (Calsbeek and Goodnight 2009. Evolution 63:2627-2635). To see what a random “skewer” is consider that in a multivariate selection experiment the response to selection is given by:
R = GP-1S = Gβ
The β is a vector that describes the direct effects of selection on the different traits. The G matrix is sometimes thought of as a “rotation matrix” in that, while what it does from a biologists perspective is tell us what the R vector or response to selection, from a mathematicians perspective what it does is rotate and warp the β vector. Thus, if we take any arbitrary β vector and multiply it by two different G matrices the two matrices will rotate and stretch the β vector in different ways producing two different R vectors. We can use this because if the two matrices are identical the two rotated vectors will be identical, whereas if the matrices are different the two rotated vectors will also be different. These can be compared by calculating the vector correlation between the two vectors. In linear algebra terms this is (I am SO sorry I am doing this to you!)
For the non-linear algebraic adept (he said raising his hand), the numerator is really just a means of calculating a covariance between the two vectors, and the denominator is the square root of the product of the two covariance matrices from the vectors.
So, with the random vectors approach what you do is generate a large number (1000 or more) random unit vectors. These represent a set of selection gradients in random directions. For each gradient you calculate the resulting R vector using your two matrices, and calculate the vector correlation. If the average correlation is close to one, then they are the same, whereas if it is less than one the two matrices are different.
The question, of course, is how close to one is close enough. Here again the bootstrap comes in. Following the approach I outlined last time, we generate a large number of pairs of matrices that are estimated from bootstrap samples of the same data set. Because they are estimated from the same data set there can be no true difference, so if we calculate the average correlation between these two matrices this will give us a distribution of the correlation when the null hypothesis is true. It is then a simple matter to compare the actual correlation with the bootstrap correlations. If the actual correlation is less than 95% (or what ever) of the bootstrap correlations then we can say that the two matrices are significantly different from each other.
This is an interesting point. Here we are using the null hypothesis that the two matrices are identical. Thus, we set up the bootstrap such that the null hypothesis was true, and compared our actual correlation with the bootstrap correlation. In the original random skewers approach the opposite was the case. The null hypothesis was that the two matrices were uncorrelated, and thus those papers use a different approach to significance testing. I googled hard for a joke about getting null hypotheses backwards, but apparently this is too subtle for the online community.
The selection skewers is similar to random skewers, with a few important changes. This analysis is appropriate if you are specifically interested in comparing how two populations will respond to a particular selection pressure. For example, you may have two recently diverged populations and want to determine whether the two populations will respond in the same manner to a particular selection pressure. In most cases you will likely have a known S vector, which is the raw selection differential. This is what I assume in the program I provided. In this case you first need to generate the b = P-1S vector. Then as with the random skewers you calculate the vector correlation, and compare the actual correlation to the correlation in the bootstrap data sets when the true null hypothesis is zero
The nice thing about both the random skewers and the selection skewers is that they give a real world idea of what changes in shape can do. The random skewers is agnostic as to how selection actually works, whereas the selection skewers tests a specific selection regime. This later is particularly interesting, since it is entirely possible for two matrices to have very different structures (as determined say by the rank/Bartlett’s/Mantel tests), and yet have this structural difference have very little actual effect on the response to selection. On the down side, however, the random and selection skewers lump a lot of information together. For example, it can be hard to determine whether a difference in response between to matrices is due to a difference in the total amount of available variation, or due to changes in the correlation structure leading to negative genetic correlations.
I guess the real lesson from all this is that there is no one best statistical test. Which is best depends on the question you ask. If you want detailed insights into the actual covariance matrices the rank/Bartlett’s/Mantel test may be best. If you want a summary of the difference in the ability to respond to selection random skewers may be a good choice, and if you have a clear a prior selection hypothesis to test the selection skewers is clearly the best.
To remind you I have an R script that performs these tests and can be relatively easily modified for different data sets and circumstances.
Here is the program:
Writeup on how to use the program: Matrix comparison writeup
The program:Bootstrap command
Relevant example data sets: