- What is "genome size"?
- Why should anyone care about genome size?
- How many species are in the database?
- How long has the database been available?
- Who assembled / moderates / pays for the database?
- Why was the database assembled?
- Who can use the database?
- Are there any charges for using the database?
- How can I search the database?
- There are two (or more) different values listed for the same species. Which one should I use?
- What if my species of interest isn't listed?
- How can I convert from picograms (pg) to base pairs (bp)?
1. What is "genome size"?
Genome size, as defined here, is the haploid nuclear DNA content, or the amount of DNA in one copy of a species' chromosomes. Haploid DNA content is also referred to as the "C-value". For diploid organisms, C-value and genome size are the same thing. For polyploids, it's a little more complicated, because the C-value may actually include several genomes. I have side-stepped this issue here, because the vast majority of animals are diploids, and because "genome size" is a much better known term than "C-value".Top
2. Why should anyone care about genome size?
Many reasons. Practically speaking, you need to know how much DNA is in a genome before you sequence it or do any genetic library work with it. From a theoretical standpoint, genome size evolution has been a puzzle in genetics for more than 50 years and has important implications for evolutionary theory in general.Top
3. How many species are in the database?
The database is continually expanding, and Release 2.0 provides up to the minute stats on the number of species and records it contains. You can find a general summary on the home page and more specific details at the statistics page. Although the database contains information for thousands of species, it is important to bear in mind that there are probably several million species of animals. Indeed, several major groups (especially among invertebrates, but also in vertebrates) remain very poorly characterized in terms of genome size variation, meaning that there is still a great deal of work to be done in this area.Top
4. How long has the database been available?
The first version went online in January, 2001. Release 2.0 was launched in December, 2005.Top
5. Who assembled / moderates / pays for the database?
The database was assembled and is moderated by Dr. T. Ryan Gregory. In order to maintain control and to make use of advanced programming features, the database is posted on a paid server and until recently the costs were covered out of pocket by Dr. Gregory (this is now paid for using grant support). Donations are not solicited for supporting the database, but you can help by drawing attention to newly published data and by participating in the discussion forum. Buying Dr. Gregory's book wouldn't hurt, either!Top
6. Why was the database assembled?
The Animal Genome Size Database is the only comprehensive database of animal genome size data, but it is not the only genome size database to be assembled. Botanists have been compiling genome size data since the 1970s, and the Plant DNA C-values Database has been available online since 1997. A few small animal databases have been posted in the past, but nothing comparable to the plant database. The idea of assembling one for animals was discussed in the late 1990s by Dr. Gregory by his then-Ph.D. advisor, Dr. Paul Hebert, but the impetus to actually create it did not come until Dr. Gregory tried to find the paper describing the relationship between genome size and cell size in mammals which, everyone assumed, forms the basis of the correlation with metabolic rate. The problem was, there was no such paper. So, Dr. Gregory compiled as many genome and cell size data for mammals as he could and wrote the paper himself. He then did the same thing for birds, and then decided to proceed with compiling data for all animals. The motivation then, as now, was to facilitate broad studies of the patterns and consequences of genome size variation among animals at large.Top
7. Who can use the database?
Anyone, as long as they cite it. That said, it is important to note that the database as a whole represents copyrighted intellectual property. As such, the assembled data taken from the database (which include updates and corrections to the data and taxonomy) must not be reproduced without permission in published lists, online databases, or other such formats. Downloaded data should also not be redistributed -- rather, colleagues should be referred to the database to download data directly. The intent is for the data to be standardized, kept up to date, and made freely available to anyone for academic purposes, but it is also important that the investment in time, money, and energy required to create and maintain the database be protected. If you have questions about specific uses of the data that you would like to propose, please feel free to contact Dr. Gregory directly.
The correct citation of the database is as follows:
Gregory, T.R. (CURRENT YEAR). Animal Genome Size Database. http://www.genomesize.com.Top
8. Are there any charges for using the database?
No, and there are no plans to institute any.Top
9. How can I search the database?
As this is a fully customised, searchable dynamic web site, the best way to search is to visit the Search page. There you will find a large number of ways to customise your search, from selecting whole groups of animals, specific taxonomic classes, using genus, species or common names and even via which method was used to reach the listed values. Once you have brought up your query you can drill down to a record page for a selected species which shows all the collected information and whether there are any other entries for that species.Top
10. There are two (or more) different values listed for the same species. Which one should I use?
In general, discrepancies between values reflect experimental error rather than real intraspecific variation. One common strategy is to average the values if they do not differ by a large amount. It is also possible to make a judgment based on the other information given, which includes the publication date and the method and standards used. More recent measurements using flow cytometry or Feulgen densitometry are probably more reliable than old ones using bulk biochemical analysis, for example. A value is usually also more reliable if the standard and unknown were of the same cell type (e.g., bird erythrocytes vs. bird erythrocytes, instead of bird erythrocytes vs. human leukocytes). This relates to the differential uptake of stains according to DNA compaction levels (see Hardie et al. 2002), which applies to both Feulgen densitometry and flow cytometry. Finally, if you are not sure at all what to do and the values differ substantially, you can consult the original papers and decide which one seems to have followed the best practice overall (or, failing that, you can contact Dr. Gregory for his input), and you can see if one of the values appears to be an obvious outlier relative to other members of the group. It must be stressed that discrepancies in the database are not the fault of the database moderator! All estimates located in the literature have been included, and unfortunately these are not all equally reliable. These have been standardized to the best possible extent, but if the methods were not accurate, then the value will not be accurate. In the future, better standardization of methods would help this tremendously, but until then we are all left with unavoidable errors to work around.Top
11. What if my species of interest isn't listed?
Unfortunately, it is likely that no data are available for it, especially if it's an invertebrate. The first step is to make sure that the name you are using is up to date. Online taxonomic databases like FishBase, Amphibian Species of the World, AmphibiaWeb, Mammal Species of the World, Avibase, EMBL Reptile Database, and the Integrated Taxonomic Information System can help you to be sure. There is also a relatively slim possibility that it has been published and Dr. Gregory has missed it, so a literature search may turn something up (please share this information if it does!). Thirdly, there is an even slimmer possibility that someone has measured it and hasn't gotten around to publishing it yet (sometimes unpublished data are posted here, but usually not if they are part of a larger study in progress).Top
12. How can I convert from picograms (pg) to base pairs (bp)?
Number of base pairs = mass in pg x 0.978 x 109
1pg = 978 Mb
(For derivation of this formula, see Dolezel et al., Cytometry 51A: 127-128, 2003).