Introduction

In the most general terms, our research focus can be described as “the evolution of genomes at large”, and the cellular, organismal, and evolutionary implications thereof. In particular, our primary topic of study in this regard is the extensive variation in nuclear DNA contents (“C-values”) among eukaryotes. Overall, the size of a eukaryotic nuclear genome bears no relationship to the number of genes it contains or the complexity of the organism in which it is found. For the first few decades of genome size study (ca. 1950-1980), this was considered highly counterintuitive, and even became known as the “C-value paradox” in the early 1970s. The “paradox” was resolved more than 25 years ago with the discovery on non-coding DNA, but a new and multi-faceted puzzle known as the “C-value enigma” has emerged in its place.

Under the early view that genomes contained the genes, all the genes, and nothing but the genes, one would not be surprised to find that the genome of a simple animal like a placozoan contains far less DNA than that of a complex one like humans. However, one would not have expected the Marbled (African) lungfish genome to contain nearly 40 times as much DNA as our own. This formed the basis of the so-called “C-value paradox”, but is explained by the existence of a large but interspecifically variable amount of non-coding DNA in most animal genomes. Note that genome sizes are typically given either in terms of mass (in picograms, where 1 pg = 10-12 g) or number of base pairs (usually in millions of bases, or megabases, where 1pg = 978 Mb).

The C-value enigma consists of several independent but equally important component questions, many of which are being addressed in our research. These include: What sorts of DNA sequences make up the non-coding majority of different genomes? How is non-coding DNA gained and lost from genomes over evolutionary time? Does non-coding DNA have any effects on, or perhaps even functions for, the cells and organisms in which it resides? Why are some genomes so streamlined while others are positively enormous?

The components of the human genome. Only about 1.5% of the more than 3 billion base pairs encode protein-products, whereas about 45% consists of “genomic parasites” (transposable elements: DNA transposons, LTR retrotransposons, and Long- and Short Interspersed Nuclear Elements [LINEs, and SINEs]) and especially their extinct remnants. From Gregory (2005b).

To address the causes and consequences of variation in genome size, our lab makes use of the compiled data in the Animal Genome Size Database (a comprehensive online catalogue of nearly 4,000 animal genome sizes created by Dr. Gregory in 2001) as well as large numbers of original estimates from a diverse array of animals. Using these broad datasets, we are conducting comparative analyses of genome size variation in different groups of both vertebrates and invertebrates in order to elucidate the general patterns of distribution among taxa and the relationships between genome size and diverse cellular, organismal, and ecological parameters (e.g., cell size, body size, metabolic rate, developmental complexity, geographical distribution, and species richness). In terms of original data, our primary focus is on invertebrates, which are badly understudied from the perspective genome size variation. More broadly, we are also interested in the evolution of key genomic components, especially transposable elements, that contribute substantially to genome size diversity. At a larger scale, we have an interest in the frequency and implications of whole-genome duplications (i.e., polyploidy), especially in animals.

As genome size increases in eukaryotes, the percentage of the genome composed of genes (white circles) becomes smaller. Conversely, the percentage of the genome made up of transposable elements (black circles) increases with genome size. From Gregory (2005b).

Patterns of variation


The genome sizes of eukaryotes have been reported to vary over 300,000-fold, although the high end of this distribution (Amoeba at 700pg) is probably not reliable. Genome sizes are known to range more than 3,300-fold in animals, and in vertebrates alone the variation is greater than 350-fold. Of the more than 4,000 species for which genome size data are now available, about 2/3 are vertebrates. Thus, despite their overwhelming numerical dominance, invertebrates remain vastly underrepresented in the genome size dataset. Nevertheless, it is clear that extraordinary variation typifies invertebrate taxa as well. One of the major objectives of our lab is to greatly expand the coverage of invertebrate genome sizes; without such data, a truly comprehensive understanding of animal genome size evolution will not be possible.

The reported ranges in haploid nuclear DNA content among the major groups of living organisms. From Gregory (2005b).

Nucleus and cell size


Some of the best-known consequences of increased DNA content are positive effects on nucleus size, cell size, and negative ones on cell division rate. That is to say, large genomes are usually found in large, slowly-dividing cells. Indeed, a general association between nucleus size and cell size has been appreciated for well over a century, long predating the rise of molecular biology or genomics.

The strong positive relationship between the sizes of nuclei (black ellipses), and thus DNA contents, and of red blood cells (clear ellipses) in vertebrates. The figure on the left was drawn by George Gulliver in 1875; the one on the right represents photomicrographs of cells taken at the same magnification by Gregory (2001a).

It is likely that genome size directly affects nucleus size, which in turn causally influences cell size, perhaps by effects on the duration of the cell cycle and cellular growth. To be sure, the cells of some species would be physically incapable of containing the DNA found in other species, meaning that there must be some causative association at least on a certain level. It is also interesting to note that in mammals, among whom red blood cells are enucleated (i.e., do not contain genomes), there is still a positive correlation between genome size and cell size.

Photomicrographs of red blood cells from the siamese fighting fish (Betta splendens; 2C = 1.3pg) and the Australian lungfish (Neoceratodus forsteri; 2C = 105pg) taken at the same magnification showing that the nuclei of the latter could not physically fit inside the cells of the former. “2C” refers to the DNA content of these diploid cells. From Gregory (2001b).

Several theories have been proposed to explain these relationships between genome size, nucleus size, cell size, and cell division rate, but this remains a subject of debate because direct tests have not yet been performed. Moreover, most of the data demonstrating these associations come from red blood cells (erythrocytes) in vertebrates and meristems and leaves in plants. Our lab is therefore interested in examining the effects of DNA content variation on different cell types, and in investigating their possible mechanistic underpinnings.

The extraordinary variation in red blood cell sizes among vertebrates. In both images, the large elliptical cells in the centre are those of the aquatic salamander Amphiuma means (2C = 165pg) and the small discs surrounding them are those of humans (2C = 7.0pg). Note that mature mammalian red blood cells can achieve a particularly tiny size despite a relatively large genome because they do not contain nuclei. (A) taken using light microscopy by Wintrobe (1933); (B) taken using scanning electron microscopy by Lewis (1996). As reprinted in Gregory (2005a).

Morphology


Cell size, and by extension genome size, is potentially related to several key parameters at the organism level. The most obvious of these is body size, given that animal bodies are made of cells. Indeed, in several invertebrate groups, genome size is positively correlated with body size, although this is far from universal and does not apply in many other taxa. Our lab is continuing to evaluate possible relationships between genome size and morphological parameters in various invertebrate groups.

Two examples of a correlation between genome size and body size in particular groups of animals. In both flatworms (which are soft-bodied; left) and copepod crustaceans (hard-bodied; right), species with larger genomes also have larger bodies. From Gregory et al. (2000). This type of relationship is not found in all groups, however.

Metabolism


Red blood cells function primarily in gas exchange, and as such variation in their sizes can potentially affect organismal features such as metabolic rate (e.g., through surface area to volume ratio and ion exchange effects). In both mammals and birds, there is a clear inverse correlation between genome size/cell size and metabolic rate. It is likely that both mammals and birds maintain small genomes due to the strong constraints on cell size imposed by the metabolic demands of homeothermy (“warm-bloodedness”). A similar relationship has long been assumed for amphibians, but more careful analysis by our lab has revealed that this does not hold well within either frogs or salamanders. Further study along these lines, and especially with regard to groups besides homeotherms and amphibians, is another area of interest for our lab.

Average genome size versus relative flight ability in birds. There is a significant negative correlation between genome size and metabolic rate in birds (and mammals), with strong-flying birds (and bats) having especially small genomes. From Gregory (2005a).

Development


Cell growth and division represent a critical component of organismal development, and as a result there is the potential for genome size to be important here as well. In keeping with this, negative correlations have been found between genome size and developmental rate, growth rate, and limb regeneration rate in various groups (especially amphibians and plants). However, the developmental rate is only one side of the coin: the flipside is “developmental complexity”, which reflects the amount of developing to be done in a given amount of time. This is especially clear when considering metamophosis, which is a strongly time-limited period of intense tissue growth and differentiation. Groups that undergo rapid or intense metamorphosis tend to have small genomes, whereas those that do not metamorphose may have much larger genomes. This pattern is clear among amphibians, and may also apply to insects. Our lab is investigating the latter possibility by compiling data from insects exhibiting different modes of development.

Genome size and developmental complexity in insects. Groups with complete metamorphosis involving distinct egg, larval, pupal, and adult stages (“holometabolous development”) are shown at the top, whereas those below the line display either incomplete metamorphosis involving a series of nymphal moults (“hemimetabolous development”) or no metamorphosis at all (“ametabolous development”). Based on currently available data, it appears that only a few beetle species cross a hypothetical threshold of 2pg among holometabolous orders, whereas the hemimetabolous and ametabolous orders commonly contain species whose genomes exceed this threshold by a wide margin. From Gregory (2005a).

Other parameters


In addition to the parameters listed in the previous sections, our lab is investigating possible associations between genome size and such parameters as geographical distribution, species richness, and other features at the subgenomic, chromosomal, nuclear, and cellular levels. What is clear from these analyses is that genome size changes by a variety of mechanisms and can have many different impacts on organisms through its influence on cell size and cell division rate. It is also clear that, although the cell-level consequences of genome size change appear to be universal, the specific organism-level implications vary according to the biology of the group in question. Identifying the specific impacts of genome size variation in different animals represents an exciting and important component of our overall research program.

The multi-faceted and cascading consequences of variation in genome size. Nuclear DNA content is influenced by a variety of subgenomic processes, and in turn affects cell size and cell division rate. This plays out in different ways according to the biology of the group under study. In mammals and birds, metabolic rate appears to be affected, whereas in amphibians and insects, development is the parameter that is most strongly impacted. At higher levels, these relationships can influence the types of ecological options available to different organisms, making genome size an important consideration in many areas of biological research.

Variation within individuals


Nuclear DNA contents vary not only across species, but also among different cell types within the bodies of individual organisms. Human liver cells (hepatocytes), for example, may be 4C (tetraploid), 8C (octoploid), or higher. The silk gland cells of silkworm moth larvae may hold the record in this regard, reaching a level as high as 500,000-ploid. This process of “endopolyploidy” often represents a functional amplification of DNA content among cells that produce a high output of protein products. Examining the patterns of occurrence and adaptive significance of endopolyploidy is another area of interest for our lab.

An example of the adaptively significant multiplication of the genome in specialized cells. (A) represents an ovariole from an ovary of the fruitfly Drosophila melanogaster. This species has specialized cells (“trophocytes”) that provide nourishment to the maturing egg. In order to meet this demand, the trophocytes become increasingly highly polyploid as the egg matures and moves along the ovariole. The most mature segment of the ovariole at the far left has broken open such that the very large trophocyte nuclei are dispersed alongside the rest of the ovariole. For comparison, note the tiniest pink circles, which are diploid nuclei. (B) shows equivalent structure from the cat flea Ctenocephalides felis, which nourishes its maturing eggs by a different mechanism and therefore lacks these specialized cells with massively amplified nuclear genomes; instead, it appears that the entire ovariole becomes polyploid (though not nearly to the degree seen in fruitfly trophocytes). Photos by T.R. Gregory.

Methodology


Our lab is continuing to develop a new method of computerized image analysis for the measurement of nuclear DNA content. This technique, known as “Feulgen image analysis densitometry”, combines the advantages of more traditional methods like scanning densitometry and flow cytometry, and provides an efficient and comparatively inexpensive means of quantifying nuclear DNA. Our lab is also concerned with developing optimized protocols for image analysis and flow cytometry to allow the measurement of genome sizes from various types of tissues and from preserved material.

A generalized diagram of the Feulgen image analysis densitometry equipment that we use in our lab. By capturing an image of the microscope field on a computer, it is possible to analyze the density, and hence the DNA content, of a large number of stained nuclei instantaneously and simultaneously. This greatly increases the efficiency of the measurements and allows many more species to be analyzed rapidly. The illustration shown here includes a monochromatic light filter at 560nm, which increases absorption by the stained nuclei, although in practice unfiltered light is typically used in our measurements. For a detailed users’ guide to Feulgen image analysis densitometry, see Hardie et al. (2002).

Prof. Gregory (then a graduate student) discussing the finer points of genome size measurement with friend and colleague Prof. Ellen Rasch of East Tennessee State University, one of the world’s most experienced genome size researchers.

Feulgen-stained nuclei from (A) red blood cells from a chicken (Gallus domesticus), (B) white blood cells from a human (specifically, Prof. Gregory), (C) liver cells from a mouse, (D) haemocytes (“blood cells”) from a yellow mealworm beetle (Tenebrio molitor), (E) spermatozoa from T. molitor, and (F) coelomocytes and spermatozoa from the earthworm Eisenia fetida. From Gregory (2005a).

Evolutionary theory


In order to understand the massive variation in eukaryotic genome sizes, it is necessary to acknowledge the genome itself as a level of biological organization forged by evolutionary processes acting at lower levels (i.e., within the genome). However, genome size also affects and is influenced by features at higher levels of biological organization (e.g., metabolism or development). As such, the study of genome size evolution provides an ideal venue for the construction of a pluralistic and hierarchical conceptual framework for understanding the operation of natural selection and other evolutionary principles at multiple levels. An important part of our work involves the development of this hierarchical approach to genome size in particular, and an exploration of its implications for evolutionary theory in general.

Interactions among multiple levels of biological organization as revealed through the study of genome size evolution. In this simplified schematic, autonomous DNA sequences such as transposable elements (“selfish DNA”) increase in number within genomes due to a process of “intragenomic selection”. This influences genome size, which then affects various cellular, organismal, and ecological features. There is also a top-down pathway of influence, with ecological factors imposing selective pressures on organisms, and thence on cells and ultimately on genomes and their constituent elements. This hierarchical or multi-level approach to evolutionary biology is most common in paleontological discussions, but is also well suited to the study of large-scale genome evolution. From Gregory (2005c).

Biodiversity


The extraordinary diversity of life on Earth can be studied from a variety of perspectives. Exploring the diversity of genome sizes represents one important aspect, but our lab has other interests in this area as well. For example, we maintain close ties to the Canadian Barcode of Life Network, which is developing a DNA-based method for species identification and discovery. We are also interested in exploring linkages (in both directions) between ecology, environment, and evolution at the organism level with patterns and processes of molecular evolution. More broadly, our focus on general questions in evolutionary biology means that we have a strong interest in investigating how biological diversity arises at all levels, from elements within genomes to species within biotas. This mission is advanced by our role as members of the Biodiversity Institute of Ontario, an interdisciplinary research institute based at the University of Guelph.

References


The following list includes the publications cited on this page as well as some suggestions for further reading. Please click here for a complete list of our lab’s publications.

DeSalle, R., T.R. Gregory, and J.S. Johnston. (2005). Preparation of samples for comparative studies of arthropod chromosomes: visualization, in situ hybridization, and genome size estimation. Methods in Enzymology 395: 460-488.

Gregory, T.R., P.D.N. Hebert, and J. Kolasa. (2000). Evolutionary implications of the relationship between genome size and body size in flatworms and copepods. Heredity 84: 201-208.

Gregory, T.R. (2001a). Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma. Biological Reviews 76: 65-101.

Gregory, T.R. (2001b). The bigger the C-value, the larger the cell: genome size and red blood cell size in vertebrates. Blood Cells, Molecules, and Diseases 27: 830-843.

Gregory, T.R. (2002a). A bird's-eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class Aves. Evolution 56: 121-130.

Gregory, T.R. (2002b). Genome size and developmental complexity. Genetica 115: 131-146.

Gregory, T.R. (2003). Variation across amphibian species in the size of the nuclear genome supports a pluralistic, hierarchical approach to the C-value enigma. Biological Journal of the Linnean Society 79: 329-339.

Gregory, T.R. (2004). Macroevolution, hierarchy theory, and the C-value enigma. Paleobiology 30: 179-202.

Gregory, T.R. (2005a). Genome size evolution in animals. In The Evolution of the Genome (ed. T.R. Gregory), pp. 3-87. Elsevier, San Diego.

Gregory, T.R. (2005b). Synergy between sequence and size in the study of genomes at large. Nature Reviews Genetics, in press.

Gregory, T.R. (2005c). Macroevolution and the genome. In The Evolution of the Genome [link to www.genomesize.com/rgregory/book] (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Gregory, T.R. (2005d). The C-value enigma in plants and animals: a review of parallels and an appeal for partnership. Annals of Botany 95: 133-146.

Hardie, D.C., T.R. Gregory, and P.D.N. Hebert. (2002). From pixels to picograms: a beginners' guide to genome quantification by Feulgen image analysis densitometry. Journal of Histochemistry and Cytochemistry 50: 735-749.

Lewis, J.H. (1996). Comparative Hemostasis in Vertebrates. Plenum Press, New York.
Wintrobe, M.M. (1933). Variations in the size and hemoglobin content of erythrocytes in the blood of various vertebrates. Folia Haematologica 51: 32-49.