Genome size and longevity in fishes -- the debate continues

Introduction
    Is genome size positively correlated with maximum lifespan in fishes?  Griffith et al. (2003) think so, but I am much more skeptical.   The whole genome size vs. longevity issue started a few years ago with  Monaghan and Metcalfe's (2000), report of a positive correlation in birds.  Morand and Ricklefs (2001) challenged their conclusions for birds (but did not evaluate the relationship in mammals, as Griffith et al. 2003 incorrectly suggest), and then I did a much larger analysis using the Animal Genome Size Database and could not find any correlations at all between genome size and longevity or any other developmental parameters in birds or mammals (Gregory 2002; download).  Griffith et al. (2003) examined genome size and longevity in a handful of fishes and suggested a positive relationship.  However, I considered their analysis to be quite problematic, and so I provided a response to their paper in the same journal (Gregory 2004; download). 

In reply, Civetta et al. (2004) presented the following rebuttal which, in my opinion, further exposes the problems in their study. 

My subsequent responses are given in bold. 



Response to Gregory's Letter to the Editor: genome size and its correlation with longevity in fishes
A. Civetta, O.L. Griffith, and G.E.E. Moodie
Experimental Gerontology 39: 861-862

    Different adaptive explanations for the evolution of genome size have been attempted and a very controversial one is that differences in genome size are correlated with differences in longevity in both birds and fish (Monaghan and Metcalfe, 2000, 2001; Morand and Ricklefs, 2001; Gregory, 2002; Griffith et al., 2003; Gregory, 2004).
    In his letter, Gregory (2004) claims that we reported direct and phylogenetically corrected positive correlations between genome size and longevity in fishes (Griffith et al., 2003) and that these correlations are meaningless because they are non-significant. A problem with his criticism is that we never reported correlations using the entire data set but only on phylogenetically corrected data (Griffith et al., 2003).

    True.  I gave them credit for more analyses than they actually did, and instead I should have said "none of the correlations they tried was significant".  (In my view, they probably should have tried direct correlations as well, as discussed below.)  The one significant correlation they did report was the relationship in sturgeons, but that was with polyploidy and not genome size per se (Gregory 2004). 

Using a partial correlation analysis on our entire dataset, which is available through appendix 1 (see Griffith et al., 2003), a positive and significant correlation is found between genome size and longevity while controlling for differences in body mass (r = 0.36; p = 0.01; n = 113)
, but such correlation disappears when the more distant Chon­drostei (Acipenseriformes) species are removed from the analysis. Removal of this distant group and focus on the teleosts yields a negative but non-significant correlation (r = - 0.12; p = 0.22; n = 101).

    There is a major discrepancy here.  Using all their appendix data, I got r = - 0.01, p = 0.90 (Gregory 2004), but they got a significant positive correlation.  The difference is that Civetta et al. (2004) did not log-transform their data prior to performing these correlation analyses. 
This is a significant issue, because both the genome size and longevity data from their appendix are highly skewed, as shown in the histograms below.   It is not entirely clear whether they log-transformed the data for the phylogenetic correlations in this letter or in the original report either -- if not, then there is strong reason to doubt those results as well.



    This means that the few, large genomes make an undue contribution to the correlation, and anchor it at the high end to make it positive.  The relationship using untransformed data (Civetta et al. 2004) is shown on the left with three apparent "anchor points" indicated, and the one using log-transformed data (Gregory 2004) is on the right.  These are plots of regression residuals of both longevity and genome size vs. mass (which gives the same results as a partial correlation).  Griffith et al. (2003) argue that the genome size data do not need to be mass-corrected based on their covariance analysis, but by direct correlation there is a significant relationship between the two in their dataset when log-transformed (r = 0.39, p < 0.0001, n = 113).


 
    Is genome size negatively correlated with longevity after removing the more distant group of Chondrostei? The most conservative answer is that we simply do not know. Using all species of fishes or only the teleosts to establish correlations between genome size and longevity is proble­matic because many data points are not statistically independent due to shared evolutionary history. A solution to this problem is to test correlations using phylogenetic comparative methods (see Felsenstein, 1985). In fact, a recent simulation study has shown that correlation analysis of interspecific data, as done by Gregory (2004), without considering their phylogenetic relationship leads to very poor estimates compared to results obtained taking into account their phylogeny (Martins et al., 2002). These phylogenetic comparative methods perform better than nonphylogenetic approaches even when the assumptions of the methods are somehow violated by the evolutionary history of the traits being analyzed (Martins et al., 2002). For this and no other reason, we limited our correlation analysis to orders for which phylogenies were available (Acipenseriformes, Cypriniformes and Salmoniformes) (Griffith et al., 2003). For all these orders, we detected a positive but non-significant correlation between genome size and longevity for phylogenetically independent contrasts (r = 0.30; p = 0.17; n = 22) (Griffith et al., 2003).

    I
t shouldn't really need stating that one computer simulation study is not at all conclusive, since the results depend heavily on the model design.  As far as studies using real genome size data go, there are several examples where little discrepancy is found between direct and phylogenetically corrected correlations (Ricklefs and Starck 1996; Morand and Ricklefs 2001; Gregory 2002, 2003).  In fact, one wonders how carefully Griffith et al. (2003) and Civetta et al. (2004) read the Martins et al. (2002) paper, because the method they used (Felsenstein's PICs) is the one correction method tested that performs very poorly when its assumptions are seriously violated (i.e., when the features in question are not evolving under a model of Brownian motion with no evolutionary constraints).  According to Martins et al. (2002), PICs may also be problematic when using a small dataset (as Griffith et al. 2003 did).
   
There is also the issue that the phylogeny must be accurate, and that even with a perfect phylogeny and flawless correction method, there is still a taxonomic bias when most of the data come from one or a few groups (i.e., more of the independent contrasts will be from heavily sampled taxa).  It should also be noted that direct correlations in previous studies of this kind have generally been done using a taxonomically hierarchical correlation approach, which re-assorts the variation at each level and also helps to remove the problem of non-independence (e.g., Vinogradov  1995; Gregory 2002), so it is inaccurate to imply that past authors did not consider this issue.
    In any case, the point of phylogenetic correction is that considering species data as independent inflates the degrees of freedom and increases the chance of a Type I error (i.e., thinking there is a relationship when really there is not).  This is pretty much the result of the Martins et al. (2002) analysis, and was the basis for the development of phylogenetic correlation methods.  This is the point I made in my letter, that I used direct correlations that should have been too likely to give a significant correlation, but could not find one. 

 There is such a thing as a non-significant positive correlation. A positive correlation with a P value of 0.051 is not less positive than one with P value of 0.049.


    I would agree at this level (both round to p = 0.05), but this is irrelevant because they did not have P-values anywhere near the slightly grey area immediately surrounding p = 0.05.  Theirs were between 0.15 and 0.60, which was the source of my complaint about the use of the term "non-significant positive correlations".  The question is whether these actual results, not a hypothetical "marginal" relationship, can fairly be called "positive correlations".  They obviously assume a priori that such a relationship exists, because they explain the lack of significance as being due to a reduction in statistical power when phylogenetically corrected.  But they did not do any direct correlation analyses, so it is not possible to say whether the correlation fails to hold up when corrected because of this effect or simply because there is no relationship.

    Gregory also criticizes our choice of taxa. We agree that
it is problematic to perform analysis across infraclasses [emphasis added].  This is why we used analysis of covariance to test for the effect of genome size on longevity while controlling for different orders, and why we did not report correlations using the entire dataset as Gregory does (Gregory, 2004). 

    And yet, Civetta et al. (2004) do new analyses across infraclasses using both direct and phylogenetic correlations!  It bears noting that phylogenetic correction does not fix the problem, because this still involves using data from across infraclasses in the same analysis.  In fact, this might make the problem worse, because now there are indpendent contrasts being made directly between infraclasses at the deeper nodes.

Our use of maximum body size can be simply explained by the well-known fact that fishes have indeterminate growth.
 Using mean weight values for fish would be more problematic than using maximum values. Mammals may indeed be better dealt with on the basis of mean values because they have determinate growth. So his imagined comparison based on a human value of 500 kg is irrelevant.

    Indeterminate growth or not, how many rainbow trout actually weigh 25.4 kg?  On average, rainbow trout probably weigh closer to 0.5-2 kg.  How often does one encounter a 3 kg goldfish?  My point was that we want our analyses to be based on biologically meaningful data, and that using such extremes (including maximum recorded lifespan, for that matter) may not reflect what is happening with the vast majority of real-life organisms.  In my opinion, using 25 kg for the trout mass value is not any more realistic than using 500 kg for humans, despite differences in their developmental programs.

    His criticism of the fact that the significant positive correlation between mass-corrected longevity and genome size within the order Acipenseriformes could be biased due to polyploidy is relevant and deserves consideration. Using species of the Acipenseriformes order listed in Appendix 1 (Griffith et al., 2003) and controlling for both body weight and chromosome number results in a non-significant positive correlation between c-value and longevity (r = 0.51; p = 0.20; n = 10).

    My result using Griffith et al.'s (2003) appendix, log-transformed, with some updates on longevity from Fishbase and genome size and chromosome numbers from the
Animal Genome Size Database, gave r = - 0.76, p = 0.03.  Perhaps this major discrepancy is again based on a lack of log-transformation by Civetta et al. (2004).  I was only able to come up with n = 8 data points for which all four necessary parameters (DNA content, chromosome number, body mass, and longevity) were available.  They seem to have found two more, but did not specifiy where these came from, so I can't double-check whether these made such a big difference.  If so, then I'd say the relationship is very unreliable, since adding two new points changes it completely.

 The problem of including non-phylogenetically independent data still remains in this dataset. The limitation due to the lack of a well-resolved phylogeny for the orders used in our original study (Griffith et al., 2003) is one that should receive closer attention in future comparative studies across species. Using sequence data from mitochondrial cytocrome b available in GenBank (http://www.ncbi.nlm.nih.gov) for species belonging to the Acipenseriformes, Cypriniformes and Salmoniformes, a preliminary gene tree can be used to perform phylogeneti­cally independent contrasts between differences in c-value, maximum age, body mass and chromosome numbers. A partial correlation between differences in genome size and longevity while controlling for differences in body mass and chromosome number results in a significant and positive correlation (r = 0.37; p = 0.034; n = 35). Caution should be exercised due to the fact that a cytochrome b gene tree does not necessarily reflect the true species phylogeny. However, we found only positive partial correlations among differences in genome size and longevity when we tried different tree topologies and branch lengths.

    In summary, the use of our entire dataset (see Appendix 1, Griffith et al., 2003) shows only a positive correlation between genome size and longevity for phylogenetically independent comparisons after controlling for differences in body mass and chromosome number. Gregory’s finding of a significant negative correlation within the teleosts and using only Acipenseriformes is flawed by the problem of comparing non-phylogenetically independent data points which will produce less accurate results than correlation analysis in the context of phylogenetic corrections (Martins et al., 2002).

    They don't specifiy which species were used in this analysis, where they got their chromosome number data, whether they log-transformed, or how reliable their tree topology was.  Plus, this involves a direct comparison across infraclasses, which was one of the criticisms they apparently agreed with (see above).  Most importantly, correcting for chromosome number across all the fish taxa is not the point -- it's within groups with the same basic chromosome number and basic genome size that this is most important.  I wasn't arguing that the difference in DNA content between sturgeons and teleosts was due entirely to polyploidy in the former (although this is a definite concern).  Rather, the point is that finding significant correlations within the Acipenseriformes (which is the only place they found one) is problematic, because here the differences in DNA content are most likely due to polyploidy.  I would recommend correcting for chromosome number if a comparison were being done across teleosts including salmonids, cyprinids, and mostly diploid orders -- but I do not think that correcting for chromosome number removes the problem of comparing sturgeons with teleost groups. 

Add to this his discomfort with the use of maximum lifespan data; and one wonders why Gregory is willing to use this data set to suggest his potential negative relationship between genome size and longevity in fishes.

    I was very tentative in this suggestion: "[the correlation] is actually negative ... (at least by direct correlation in a rather small sample)" and "the potential negative relationship in fishes is interesting and may warrant further investigation -- using a broad dataset without such complicating factors as polyploidy and distant relatedness" [emphasis added].  Overall, I am not convinced that there is any noteworthy correlation in either direction, or else I would have done a full analysis using the database, as I did with mammals and birds.

    Correlation studies will constantly face the brick wall of being unable to make a connection between an interesting observation and its causal explanation. Many studies on the evolution of genome size that use correlation approaches are particularly vulnerable to this problem as experimental genome size manipulations are not feasible.

    The causes of possible relationships between genome size and lifespan have nothing to do with this discussion.  The question is whether any such correlations even exist.  (Note that Griffith et al. 2003 do briefly discuss "causation" -- by listing correlations between genome size and cell cycle and metabolic parameters!).


In the mean­time, studies about the evolution of genome size will certainly benefit from more accurate measures of genome size, than the currently used literature collection of picogram per diploid cell (Gregory, 2001), coming out of
genome sequencing projects and the analysis of interspecies
data in the context of well-resolved species phylogenies.

    While the personal motivation for this shot at the database is pretty transparent, I will respond to it because it exposes some interesting misunderstandings about genome sizing methodology and other concepts.
    First, the database does not list "picogram[s] per diploid cell", it lists picograms per haploid genome, in keeping with the definition of "genome size".  Griffith et al. (2003) were evidently confused about this, giving haploid ranges for fishes but diploid ones for birds.
    Second, their implied point that, say, "base pairs" would be more accurate than "picograms" is misguided, because these two units are directly inter-convertible using a formula clearly posted on the database.
    Third, if Civetta et al. (2004) think this "literature collection" is so problematic
, why do they base their entire study on it? (For the record, the online genome size database is a compilation of published data from nearly 400 sources -- far more if one also considers the Plant DNA C-values Database assembled by my friends at RBG Kew).
    Fourth, it is extraordinarily unlikely that genome sequencing will become a common method of analyzing genome size anytime soon, for at least three reaons: 1) it costs way too much to be an efficient way to get genome size data, 2) only small (or medically/agriculturally important) genomes are so far being chosen for sequencing due to this cost (and, in fact, this is why members of the genome sequencing community are frequent visitors to the database), and 3) sequencing projects are usually not "complete", which means they are not actually very accurate for estimating genome size (e.g., Bennett et al. 2003).  Furthermore, the reality is that any genome size data produced by genome sequencing simply will be added to the database.

    In summary,
the dataset used in the fish case was quite selective, and was rather ill-chosen because it includes the confounding factor of polyploidy.  Moreover, there are many problems with the analyses involved which raise significant questions about the reliability of the results.  Taken together, this leads me to the conclusion that there are no convincing relationships between genome size and longevity in fishes, and indeed not in any other groups except perhaps at the family level in birds (although not under my larger analysis, and if at all then only covering a very small percentage of the variance in the data) and at the suborder level in reptiles (although this involves only a very small number of data and does not show up at other levels; Olmo 2003). 


 
 
References

Bennett, M.D., Leitch, I.J., Price, H.J., and Johnston, J.S., 2003. Comparisons with Caenorhabditis (~100 Mb) and Drosophila (~175 Mb) using flow cytometry show genome size in Arabidopsis to be ~157 Mb and thus ~25 % larger than the Arabidopsis Genome Initiative estimate of ~125 Mb. Annals Bot. 91: 547-557.

Civetta, A., Griffith, O.L., and Moodie, G.E.E., 2004. Genome size and its correlation with longevity in fishes.  Exp. Gerontol. 39: 861-862.

Felsenstein, J., 1985. Phylogenies and the comparative method. Am. Nat. 125: 1-15.

Gregory, T.R., 2001. Animal Genome Size Database, http://www.genomesize.com

Gregory, T.R., 2002. Genome size and developmental parameters in the homeothermic vertebrates. Genome 45: 833–838. 
Download

Gregory, T.R., 2003.  Variation across amphibian species in the size of the nuclear genome supports a pluralistic, hierarchical approach to the C-value enigma.  Biol. J. Linn. Soc. 79: 329-339.  Download

Gregory, T.R., 2004. Genome size is not correlated positively with longevity in fishes (or homeotherms). Exp. Gerontol. 39: 859-860. 
Download

Griffith, O.L., Moodie, G.E.E., and Civetta, A., 2003. Genome size and longevity in fish. Exp. Gerontol. 38: 333–337.

Martins, E.P., Diniz-Filho, J.A.F., and Housworth, E.A., 2002. Adaptive constraints and the phylogenetic comparative method: a computer simulation test. Evolution 56: 1–13.

Monaghan, P., Metcalfe, N.B., 2000. Genome size and longevity. Trends Genet. 16: 331–332.

Monaghan, P., Metcalfe, N.B., 2001. Genome size, longevity and development in birds. Trends Genet. 17: 568.

Morand, S., Ricklefs, R.E., 2001. Genome size, longevity and development in birds. Trends Genet. 17: 567–568.

Olmo, E., 2003. Reptiles: a group of transition in the evolution of genome size and of the nucleotypic effect. Cytogenet. Genome Res. 101: 166-171.

Ricklefs, R.E. and J.M. Starck., 1996. Applications of phylogenetically independent contrasts: a mixed progress report. Oikos 77: 167-172.

Vinogradov, A.E., 1995.  Nucleotypic effect in homeotherms: body mass-corrected basal metabolic rate of mammals is related to genome size.  Evolution 49: 1249-1259.


Last updated May 8, 2004.