Relationship between genome size and gene number

Genome size and gene number. « Genomicron

relationship between genome size and gene number

Distinct relationships between logtransformed protein-coding gene number (Y ′) versus logtransformed genome size (X′, genome size. The conversion between these units is simple, with 1 pg being roughly equivalent to Plant nuclear genomes have a huge range in size, from some Mb of DNA in a . Thus, difference in genome size roughly correlates with the apparent . levels of variation in structural genes (Nevo ), the possible relationship between these two genome parameters-size and variability-has yet to be ex- plored. Values for the amount of nuclear DNA, chromosome number, and estimates of.

relationship between genome size and gene number

Although smaller genomes may occur in some yet unrecognized dinoflagellates [5]the typical dinoflagellate genomes are larger than most eukaryotes examined to date. The smallest documented dinoflagellate genomes are found in the coral reef symbiont Symbiodinium spp.

It has been suggested that the large fraction of the dinoflagellate genomes are nonfunctional repeated DNA sequences [9][11] — [15]. How many genes are encoded in the genomes of these unicellular and seemingly simple organisms remains a question, which potentially bears significance on eukaryotic genome evolution.

Information on gene contents of dinoflagellate genomes will allow researchers to gain understanding on how the large genomes favor or disfavor these organisms in their wide range of habitats.

Genome size and number of genes

Unfortunately, the infeasibility of sequencing these gigantic genomes with the current technology has hindered the progress in understanding dinoflagellate gene content.

However, the challenge in assembling the relatively short fragments is still insurmountable especially because in dinoflagellates many genes occur in numerous highly similar copies [16][17]. Predictably, it will not be so soon before a dinoflagellate genome can be completely sequenced and accurately assembled to give a correct gene count. Any indirect approach to provide gene content estimate is desirable presently. Taking advantage of the rapidly growing genome sequence dataset, we analyzed the relationship between gene content and genome size in all sequenced life forms.

We then used the resultant eukaryotic regression equations to estimate gene content for dinoflagellate genomes. In light of high gene copy numbers reported for various dinoflagellates, implications of the high gene numbers and possible evolutionary mechanisms giving rise to the enormous genomes in this phylum is discussed.

Dataset included total number of nucleotide base pairs i.

Genome size

For gene-coding percentage, only data published in peer-reviewed articles were used in the analysis as data from JGI included introns and other untranslated regions and significantly overestimated gene-coding percentage in large eukaryotic genomes Supplemental Table S1. Incomplete or draft genome sequence data were excluded from this study to avoid potential errors.

Regression analyses and dinoflagellate gene content prediction The genome size and gene number datasets were subject to Shapiro-Wilk and Kolmogorov-Smirnov normality tests using SPSS When normality was violated, data were logarithmic-transformed.

What is Genomic Sequencing?

Regression analyses for logarithmic-transformed protein-coding or total gene number dependent variables versus log genome size independent variable were conducted using linear, logarithmic, and power regression models in SPSS The intention was to seek an overall correlation for all genomes, but if it failed, to seek separate correlations for separate groups of genomes e.

Origin of the term[ edit ] Tree of life with genome sizes as outer bars The term "genome size" is often erroneously attributed to Hinegardner, [2] even in discussions dealing specifically with terminology in this area of research e. Notably, Hinegardner [2] used the term only once: The term actually seems to have first appeared in when Hinegardner wondered, in the last paragraph of his article, whether " cellular DNA content does, in fact, reflect genome size".

In a paper submitted only two months later in FebruaryWolf et al. By the early s, "genome size" was in common usage with its present definition, probably as a result of its inclusion in Susumu Ohno 's influential book Evolution by Gene Duplication, published in Nuclear genome size is typically measured in eukaryotes using either densitometric measurements of Feulgen -stained nuclei previously using specialized densitometers, now more commonly using computerized image analysis [7] or flow cytometry.

In prokaryotespulsed field gel electrophoresis and complete genome sequencing are the predominant methods of genome size determination.

Genome size - Wikipedia

Nuclear genome sizes are well known to vary enormously among eukaryotic species. In animals they range more than 3,fold, and in land plants they differ by a factor of about 1, However, although there is no longer any paradoxical aspect to the discrepancy between genome size and gene number, this term remains in common usage.

relationship between genome size and gene number

For reasons of conceptual clarification, the various puzzles that remain with regard to genome size variation instead have been suggested by one author to more accurately comprise a puzzle or an enigma the C-value enigma. Genome size correlates with a range of features at the cell and organism levels, including cell size, cell division rate, and, depending on the taxonbody size, metabolic ratedevelopmental rateorgan complexity, geographical distribution, or extinction risk for recent reviews, see Bennett and Leitch ; [8] Gregory [9].

Based on completely sequenced genome data currently as of April available, log-transformed gene number forms a linear correlation with log-transformed genome size in bacteria, archea, viruses, and organelles combined whereas a nonlinear semi-natural log correlation in eukaryotes Hou and Lin [10].

The nonlinear correlation for eukaryotes, although claim of its existence contrasts the previous view that no correlation exists for this group of organisms, reflects disproportionately fast increasing noncoding DNA in increasingly large eukaryotic genomes.

Although sequenced genome data are practically biased toward small genomes, which may compromise the accuracy of the empirically derived correlation, and the ultimate proof of the correlation remains to be obtained by sequencing some of the largest eukaryotic genomes, current data do not seem to rule out a correlation.

There was a problem providing the content you requested

Genome reduction[ edit ] Genome size compared to number of genes. Log-log plot of the total number of annotated proteins in genomes submitted to GenBank as a function of genome size.

Based on data from NCBI genome reports.