Genome
size and longevity in fishes -- the debate continues
Introduction
Is genome size positively correlated with maximum lifespan in
fishes? Griffith et al. (2003) think so, but I am much more
skeptical. The whole genome size vs. longevity issue started a
few years ago with Monaghan
and Metcalfe's (2000), report of a positive correlation in
birds. Morand
and Ricklefs (2001) challenged their conclusions for birds (but did
not evaluate the relationship in mammals, as Griffith et al. 2003
incorrectly suggest), and then I did a much larger analysis using the Animal Genome Size Database and
could not find any correlations at all between genome size and
longevity or
any other developmental parameters in birds or mammals (Gregory 2002; download).
Griffith
et al. (2003) examined genome size and longevity in a handful of
fishes and suggested a positive relationship. However, I
considered their analysis to be quite problematic, and so I provided a
response to their paper in the same journal (Gregory 2004; download).
In reply, Civetta et al. (2004) presented the following rebuttal which,
in my opinion, further exposes the problems in their study. My
subsequent responses are given
in bold.
Response to Gregory's Letter to the Editor:
genome size and its correlation with
longevity in fishes
A. Civetta, O.L. Griffith, and G.E.E. Moodie Experimental Gerontology
39: 861-862
Different adaptive explanations for
the evolution of genome size
have been attempted and a very controversial one is that differences in genome
size are
correlated with differences in longevity in both birds
and fish (Monaghan and Metcalfe, 2000, 2001; Morand
and Ricklefs, 2001; Gregory, 2002; Griffith et al., 2003; Gregory,
2004). In
his letter, Gregory (2004) claims that we reported
direct and phylogenetically corrected positive correlations between
genome size
and longevity in fishes (Griffith et al., 2003) and that these
correlations are
meaningless because they
are non-significant. A problem with his criticism is that we never reported
correlations using
the entire data set but only on phylogenetically corrected data
(Griffith et
al., 2003).
True. I gave them credit for more analyses than they actually
did, and instead I should have said "none of the correlations they
tried was
significant". (In my
view, they probably should
have
tried direct correlations as well, as discussed below.) The one significant correlation they did
report was the relationship
in sturgeons, but that was with polyploidy and not genome
size per se (Gregory
2004).
Using a partial correlation analysis on our entire dataset, which
is available through appendix 1 (see Griffith et al., 2003), a positive
and
significant correlation is found between genome size and longevity
while
controlling for differences in body mass (r = 0.36; p = 0.01; n = 113), but
such correlation disappears when the more distant Chondrostei
(Acipenseriformes) species are removed from the analysis. Removal of
this
distant group and focus on the teleosts yields a negative but
non-significant
correlation (r = - 0.12; p = 0.22; n = 101). There is a major
discrepancy here. Using all their appendix data, I got r = -
0.01, p = 0.90 (Gregory 2004), but they got a significant positive
correlation. The difference is that Civetta et al. (2004) did not log-transform their data
prior to performing these correlation analyses. This is a significant
issue, because both the genome size and longevity data from their
appendix are highly skewed,
as shown in the histograms below. It is not entirely clear whether they
log-transformed the data for the phylogenetic correlations in this
letter or in the original report either -- if not, then there is strong
reason to doubt those results as well.
This
means that the few, large genomes make an undue contribution to the
correlation, and anchor it at the high end to make it positive.
The relationship using untransformed data (Civetta et al. 2004) is
shown on the left with three apparent "anchor points" indicated, and
the one using log-transformed data (Gregory 2004) is on the
right. These are plots of regression residuals of both longevity
and genome size vs. mass (which gives the same results as a partial
correlation). Griffith et al. (2003) argue that the genome size
data do not need to be mass-corrected based on their covariance
analysis, but by direct correlation there is a significant relationship
between the two in their dataset when log-transformed (r = 0.39, p <
0.0001, n = 113). Is
genome size negatively correlated
with longevity after removing
the more distant group of Chondrostei? The most conservative answer is
that we
simply do not know. Using all species of fishes or only the teleosts to
establish correlations between genome size and longevity is
problematic
because many data points are not statistically independent due to
shared
evolutionary history. A solution to this problem is to test
correlations using
phylogenetic comparative methods (see Felsenstein, 1985). In fact, a
recent
simulation study has shown that correlation analysis of interspecific
data, as
done by Gregory (2004), without considering their phylogenetic
relationship leads to very poor estimates compared to results obtained
taking
into account their phylogeny (Martins et al., 2002). These phylogenetic
comparative methods perform better than nonphylogenetic approaches even
when
the assumptions of the methods are somehow violated by the evolutionary
history
of the traits being analyzed (Martins et al., 2002). For this and no
other
reason, we limited our correlation analysis to orders for which
phylogenies
were available (Acipenseriformes, Cypriniformes and Salmoniformes)(Griffith et al., 2003).
For all these orders, we
detected a positive but non-significant correlation between genome size
and
longevity for phylogenetically independent contrasts (r
= 0.30; p = 0.17; n = 22) (Griffith
et al., 2003).
It shouldn't really need stating
that one computer simulation study is not at all conclusive, since the
results
depend
heavily on the model design. As far as studies using real genome size data go,
there are
several examples where little discrepancy is found between direct
and phylogenetically corrected correlations (Ricklefs and Starck 1996; Morand and
Ricklefs 2001; Gregory 2002,
2003). In fact, one wonders how carefully Griffith et al. (2003)
and Civetta et al. (2004) read the Martins et al. (2002) paper, because
the method they used (Felsenstein's PICs) is the one correction method
tested that performs very poorly when its assumptions are seriously
violated (i.e., when the features in question are not evolving under a
model of Brownian motion with no evolutionary constraints).
According to Martins et al. (2002),
PICs may also be problematic when using a small dataset (as Griffith et
al. 2003 did). There is also the issue that the phylogeny
must be
accurate, and that even with a perfect phylogeny and flawless
correction method, there is still a taxonomic bias when most
of the
data come from one or a few groups (i.e., more of the independent
contrasts will be from heavily sampled taxa). It should also be
noted that direct correlations in previous studies of this
kind have generally been done using a taxonomically hierarchical
correlation approach, which re-assorts the
variation at each level and also helps to remove the problem of
non-independence (e.g., Vinogradov 1995; Gregory 2002), so it is inaccurate to imply that past
authors did not consider this issue.
In any case, the point of phylogenetic correction is
that considering species data as independent inflates the degrees of
freedom and increases the chance of a Type I error (i.e., thinking
there is a relationship when really there is not). This is pretty
much the result of the Martins et al. (2002) analysis, and was the
basis for the development of phylogenetic correlation methods.
This is the point I made in my letter, that I used direct correlations
that should have been too likely
to give a significant correlation, but could not find one. There is such a thing as
a non-significant positive correlation. A
positive correlation with a P value
of 0.051 is not less positive than one with P
value of 0.049. I would agree at
this level (both round to p = 0.05), but this is irrelevant because
they did
not have P-values anywhere near the slightly grey area immediately
surrounding p = 0.05. Theirs were between
0.15 and 0.60, which was the source of my complaint about the use of
the term "non-significant positive correlations". The question is
whether these actual results, not a hypothetical "marginal"
relationship, can fairly be called "positive correlations". They
obviously assume a priori that
such a relationship exists, because they explain the lack of
significance as being due to a reduction in statistical power when
phylogenetically corrected. But they did not do any direct
correlation analyses, so it is not possible to say whether the
correlation fails to hold up when corrected because of this effect or
simply because there is no relationship.
Gregory also criticizes our choice of taxa. We agree that
it is problematic to
perform analysis across infraclasses [emphasis added]. This is why we used
analysis of
covariance to test for the effect of genome size on longevity while
controlling
for different orders, and why we did not report correlations using the
entire
dataset as Gregory does (Gregory, 2004).
And yet, Civetta et
al. (2004) do new analyses across infraclasses using both direct and
phylogenetic
correlations! It bears noting that phylogenetic correction does
not fix the problem, because this still involves using data from across
infraclasses in the same analysis. In fact, this might make the
problem worse, because now there are indpendent contrasts being made
directly between infraclasses at the deeper nodes.
Our use of maximum body size can be
simply explained by the well-known fact that fishes have indeterminate
growth. Using mean weight values
for fish would be more problematic than using
maximum values. Mammals may indeed be better dealt with on the basis of
mean
values because they have determinate growth. So his imagined comparison
based
on a human value of 500 kg is irrelevant.
Indeterminate
growth or not, how many rainbow trout actually weigh 25.4 kg? On
average, rainbow trout probably weigh closer to 0.5-2 kg. How
often does one encounter a 3 kg goldfish? My point was that we
want our analyses to be based on biologically
meaningfuldata, and
that using such extremes (including
maximum recorded lifespan, for that matter) may not reflect what is
happening with the vast majority of real-life organisms. In my
opinion, using 25 kg for the trout mass value is not any more realistic
than using 500 kg for humans, despite differences in their
developmental programs.
His criticism of the fact that the
significant positive correlation
between mass-corrected longevity and genome size within the order
Acipenseriformes could be biased due to polyploidy is relevant and
deserves
consideration. Using species of the Acipenseriformes order listed in
Appendix 1
(Griffith et al., 2003) and controlling for both body weight and
chromosome
number results in a non-significant positive correlation between
c-value and
longevity (r = 0.51; p = 0.20; n = 10).
My result using
Griffith et al.'s (2003) appendix, log-transformed, with some
updates on longevity from Fishbase
and genome size and chromosome numbers from the Animal
Genome Size Database, gave r = - 0.76, p = 0.03. Perhaps
this major discrepancy is again based on a lack of log-transformation
by Civetta et al. (2004). I was only able to come up with n = 8
data points for which all four necessary parameters (DNA content,
chromosome number, body mass, and longevity) were available. They
seem to have found two more, but did not specifiy where these came
from, so I can't
double-check whether these made such a big difference. If so,
then I'd say the relationship is very unreliable, since adding two
new points changes it completely.
The problem of including non-phylogenetically independent
data still remains in this dataset. The limitation due to the lack of a
well-resolved phylogeny for the orders used in our original study
(Griffith et
al., 2003) is one that should receive closer attention in future
comparative
studies across species. Using sequence data
from mitochondrial cytocrome b available in GenBank (http://www.ncbi.nlm.nih.gov)
for species
belonging to the Acipenseriformes, Cypriniformes and
Salmoniformes, a
preliminary gene tree can be used to perform phylogenetically
independent
contrasts between differences in c-value, maximum age, body mass and
chromosome
numbers. A partial correlation between differences in genome size and longevity while controlling for
differences in body
mass and chromosome number results in a significant and
positive
correlation (r = 0.37; p = 0.034; n = 35). Caution should be exercised due to the fact that
a
cytochrome b gene tree does not necessarily reflect the true species
phylogeny.
However, we found only positive
partial
correlations among differences in genome size and longevity
when we
tried different tree topologies and branch lengths.
In
summary, the use of our entire dataset (see Appendix 1, Griffith
et al.,
2003) shows only a positive correlation between genome size and
longevity for
phylogenetically independent
comparisons
after controlling for differences in body
mass and chromosome number. Gregory’s finding of a significant negative correlation
within the
teleosts and using only Acipenseriformes is flawed by the
problem of
comparing non-phylogenetically independent data points which will
produce less
accurate results than correlation analysis in the context of
phylogenetic
corrections (Martins et al., 2002).
They don't specifiy which species were used in this
analysis, where they got their chromosome number data, whether they
log-transformed, or how reliable their tree topology was. Plus,
this involves a direct comparison across infraclasses, which was one of
the criticisms they apparently agreed with (see above). Most
importantly, correcting for chromosome number across all the fish taxa
is not the point -- it's within
groups with the same basic chromosome number and basic genome size that
this is most important. I wasn't arguing that the difference in
DNA content between sturgeons and teleosts was due entirely to
polyploidy in the former (although this is a definite concern).
Rather, the point is that finding significant correlations within the Acipenseriformes
(which
is the only place they found one) is problematic, because here the
differences in DNA content are
most likely
due to polyploidy. I wouldrecommend
correcting for chromosome number if a comparison were being done across
teleosts including salmonids, cyprinids, and mostly diploid orders --
but I do notthink that
correcting for chromosome number removes the problem of comparing
sturgeons with teleost groups.
Add to this his discomfort with the use of
maximum lifespan data; and one wonders why Gregory is willing to use
this data
set to suggest his potential negative relationship between genome size
and
longevity in fishes.
I was very
tentative in this suggestion: "[the correlation] is actually negative
... (at least by direct
correlation in a rather small sample)" and "the potential negative
relationship in fishes is interesting and may warrant further
investigation -- using a broad dataset without such complicating
factors as polyploidy and distant relatedness" [emphasis added].
Overall, I am not convinced that there is any noteworthy correlation in
either direction, or else I
would have done a full analysis using the database, as I did with
mammals and birds.
Correlation studies will constantly face the brick wall
of being unable to make a connection between an interesting observation
and its
causal explanation. Many studies on the evolution of genome
size that use correlation approaches
are particularly
vulnerable to this problem as experimental genome size manipulations
are not
feasible.
The causes of
possible relationships between genome size and lifespan have nothing to
do with this discussion. The question is whether any such
correlations even exist. (Note that Griffith et al. 2003 do
briefly discuss "causation" -- by listing correlations between genome
size and cell cycle and metabolic parameters!).
In the meantime, studies about the evolution of genome size will
certainly benefit from more accurate measures of genome size, than the
currently used literature collection of picogram per diploid cell
(Gregory,
2001), coming out of genome
sequencing projects and the analysis of interspecies data in the
context of well-resolved
species phylogenies.
While the personal
motivation for this shot at the database is pretty transparent, I will
respond to it because it exposes some interesting misunderstandings
about genome sizing methodology and other concepts.
First, the database does not list "picogram[s] per
diploid cell", it lists picograms per haploid genome, in keeping with
the definition of "genome size". Griffith et al. (2003) were
evidently confused about this, giving haploid ranges for fishes but
diploid ones for birds.
Second, their implied point that, say, "base
pairs" would be more accurate than "picograms" is misguided, because
these two units are directly inter-convertible using a formula clearly posted on the
database.
Third, if Civetta et al. (2004) think this
"literature collection" is so problematic, why do they base their entire study on it? (For the record, the online genome size
database is a compilation of published data from nearly 400 sources --
far more if one also considers the Plant DNA C-values
Database assembled by my friends at RBG Kew).
Fourth, it is extraordinarily unlikely that genome
sequencing will become a common method of analyzing genome size anytime
soon, for at least three reaons: 1) it costs way too much to be an
efficient way to get genome size data, 2) only small (or
medically/agriculturally
important) genomes are so far being chosen for sequencing due to this
cost (and, in
fact, this is why members of the genome sequencing community are
frequent visitors to the database), and 3) sequencing
projects are usually not "complete", which means they are not actually
very accurate for estimating genome size (e.g., Bennett et al.
2003). Furthermore, the reality is that any genome size data
produced by genome sequencing simply will be added to the database.
In summary, the dataset used in the fish case was quite
selective, and was rather ill-chosen because it includes the
confounding factor of polyploidy. Moreover, there are many
problems with the analyses involved which raise significant questions
about the reliability of the results. Taken together, this leads
me to the conclusion that there are no
convincing relationships between genome size and longevity in fishes,
and indeed not in any other groups except
perhaps at the family level in birds (although not under my larger
analysis, and if at all then only covering a very small percentage of
the variance in the data) and at the suborder level in reptiles
(although this involves
only a very
small number of data and does not show up at other levels; Olmo
2003).
References
Bennett, M.D., Leitch, I.J., Price, H.J., and Johnston, J.S., 2003.
Comparisons with Caenorhabditis (~100 Mb) and Drosophila (~175 Mb)
using flow cytometry show genome size in Arabidopsis to be ~157 Mb and
thus ~25 % larger than the Arabidopsis Genome Initiative estimate of
~125 Mb. Annals Bot. 91: 547-557.
Civetta, A., Griffith, O.L., and Moodie, G.E.E., 2004. Genome size and
its correlation with longevity in fishes. Exp. Gerontol. 39:
861-862.
Felsenstein, J., 1985. Phylogenies
and the comparative method. Am. Nat. 125: 1-15.
Gregory, T.R., 2001. Animal
Genome Size Database, http://www.genomesize.com
Gregory, T.R., 2002. Genome size
and developmental parameters in the homeothermic
vertebrates. Genome 45: 833–838. Download
Gregory, T.R., 2003. Variation across amphibian species in the
size of the nuclear genome supports a pluralistic, hierarchical
approach to the C-value enigma. Biol. J. Linn. Soc. 79:
329-339. Download
Gregory, T.R., 2004. Genome
size is not correlated positively with longevity in fishes (or
homeotherms).
Exp. Gerontol. 39: 859-860. Download Griffith,
O.L., Moodie, G.E.E., and Civetta, A., 2003. Genome size and longevity
in fish.
Exp. Gerontol. 38: 333–337.
Martins, E.P., Diniz-Filho,
J.A.F., and Housworth, E.A., 2002. Adaptive constraints
and the phylogenetic comparative method: a computer simulation test.
Evolution
56: 1–13. Monaghan, P., Metcalfe, N.B.,
2000. Genome size and longevity. Trends Genet.
16: 331–332. Monaghan, P., Metcalfe, N.B.,
2001. Genome size, longevity and development
in birds. Trends Genet. 17: 568. Morand, S., Ricklefs, R.E., 2001.
Genome size, longevity and development
in birds. Trends Genet. 17: 567–568.
Olmo, E., 2003. Reptiles: a group of transition in the evolution
of
genome size and of the nucleotypic effect. Cytogenet. Genome
Res. 101: 166-171.
Ricklefs, R.E. and J.M. Starck., 1996. Applications of phylogenetically
independent contrasts: a mixed progress report. Oikos 77: 167-172.
Vinogradov, A.E., 1995. Nucleotypic effect in homeotherms: body
mass-corrected basal metabolic rate of mammals is related to genome
size. Evolution 49: 1249-1259.