Evolution of Genomes

Genomic Similiarities between Distant Species

Genomic similarities between distant species can be established via analysis of genomes using advanced technology.

Learning Objectives

Discuss the evolutionary implications of observed genome similarities between distant species

Key Takeaways

Key Points

  • Genomic similarities between distant species can be explained by the theory that all organisms share a common ancestor.
  • Genomic similarities between distant species can be analysed using genomic analysis tools to create phylogenetic trees that explain these relationships.
  • Genetic distance is used to explain the genetic divergence between species or between populations within a species and can indicate how closely related they are and whether they have a recent common ancestor or recent interbreeding has taken place.
  • Horizontal gene transfer (HGT) occurs when two unrelated species exchange genes, usually two prokaryotes, although HGT occurs in some eurokaryotes as well.

Key Terms

  • conjugation: the temporary fusion of organisms, especially as part of sexual reproduction
  • phylogeny: the evolutionary history of an organism
  • horizontal gene transfer: the transfer of genetic material from one organism to another one that is not its offspring; especially common among bacteria
  • transformation: the alteration of a bacterial cell caused by the transfer of DNA from another, especially if pathogenic
  • transduction: horizontal gene transfer mechanism in prokaryotes where genes are transferred using a virus

Genomic Similarities Between Distant Species

Genetic distance refers to the genetic divergence between species or between populations within a species. Smaller genetic distances indicate that the populations have more similar genes, which indicates they are closely related; they have a recent common ancestor, or recent interbreeding has taken place. Genetic distance is useful in reconstructing the history of populations. For example, evidence from genetic distance suggests that humans arrived in America about 30,000 years ago. By examining the difference between allele frequencies between the populations, genetic distance can estimate how long ago the two populations were together.

Phylogenetic Relationships

Phylogeny describes the relationships of an organism, such as the relationship with its ancestors and the species it is most closely related. Phylogenetic relationships provide information on shared ancestry but not necessarily on how organisms are similar or different. The use of advanced genomic analysis has allowed us to establish phylogenetic trees, which map the relationship between species at a genetic and molecular level. The ability to use these technologies has established previously unknown relationships and has contributed to a more complex evolutionary history. These technologies have established genomic similarities between distant species by establishing genetic distances. In addition, the mechanisms by which genomic similarities between distant species occur can include horizontal gene transfer.


Tree of Life: Diagrammatic representation of the divergence of modern taxonomic groups from their common ancestor. This shows the genomic similarities that can exist between distant species based on their relationship with this ancestor.

Horizontal Gene Transfer

Horizontal gene transfer (HGT), also known as lateral gene transfer, is the transfer of genes between unrelated species. Genes have been shown to be passed between species which are only distantly related using standard phylogeny, thus adding a layer of complexity to the understanding of phylogenetic relationships. The various ways that HGT occurs in prokaryotes is important to understanding phylogenies. Although at present, HGT is not viewed as important to eukaryotic evolution, HGT does occur in this domain as well. Finally, as an example of the ultimate gene transfer, theories of genome fusion between symbiotic or endosymbiotic organisms have been proposed to explain an event of great importance—the evolution of the first eukaryotic cell, without which humans could not have come into existence.

The mechanism of HGT has been shown to be quite common in the prokaryotic domains of Bacteria and Archaea, significantly changing the way their evolution is viewed. The majority of evolutionary models, such as in the endosymbiont theory, propose that eukaryotes descended from multiple prokaryotes, which makes HGT all the more important to understanding the phylogenetic relationships of all extant and extinct species. These gene transfers between species are the major mechanism whereby bacteria acquire resistance to antibiotics. Classically, this type of transfer has been thought to occur by three different mechanisms: transformation, transduction and conjugation.

Although it is easy to see how prokaryotes exchange genetic material by HGT, it was initially thought that this process was absent in eukaryotes, followed by the idea that the gene transfers between multicellular eukaryotes should be more difficult. In spite of this fact, HGT between distantly related organisms has been demonstrated in several eukaryotic species.

In animals, a particularly interesting example of HGT occurs within the aphid species. Aphids are insects that vary in color based on carotenoid content. Carotenoids are pigments made by a variety of plants, fungi, and microbes, and they serve a variety of functions in animals, who obtain these chemicals from their food. Humans require carotenoids to synthesize vitamin A, and we obtain them by eating orange fruits and vegetables: carrots, apricots, mangoes, and sweet potatoes. On the other hand, aphids have acquired the ability to make the carotenoids on their own. According to DNA analysis, this ability is due to the transfer of fungal genes into the insect by HGT, presumably as the insect consumed fungi for food.


Horizontal Gene Transfer in Animals: (a) Red aphids get their color from red carotenoid pigment. Genes necessary to make this pigment are present in certain fungi, and scientists speculate that aphids acquired these genes through HGT after consuming fungi for food. If genes for making carotenoids are inactivated by mutation, the aphids revert back to (b) their green color. Red coloration makes the aphids a lot more conspicuous to predators, but evidence suggests that red aphids are more resistant to insecticides than green ones. Thus, red aphids may be more fit to survive in some environments than green ones.

Genome Evolution

Processes such as mutations, duplications, exon shuffling, transposable elements and pseudogenes have contributed to genomic evolution.

Learning Objectives

Explain the importance of genomic changes in an evolutionary context

Key Takeaways

Key Points

  • Gene and whole genome duplications have contributed accumulations that have contributed to genome evolution.
  • Mutations are constantly occurring in an organism’s genome and can cause either a negative effect, positive effect or no effect at all; however, it will still result in changes to the genome.
  • Transposable elements are regions of DNA that can be inserted into the genetic code and will causes changes within the genome.
  • Pseudogenes are dysfunctional genes derived from previously functional gene relatives and will become a pseudogene by deletion or insertion of one or multiple nucleotides.
  • Exon shuffling occurs when two or more exons from different genes are combined together or when exons are duplicated, and will result in new genes.
  • Species can also exhibit genome reduction when subsets of their genes are not needed anymore.

Key Terms

  • intron: a portion of a split gene that is included in pre-RNA transcripts but is removed during RNA processing and rapidly degraded
  • exon: a region of a transcribed gene present in the final functional RNA molecule
  • pseudogene: a segment of DNA that is part of the genome of an organism, and which is similar to a gene but does not code for a gene product

Accumulating Changes Over Time

The evolution of the genome is characterized by the accumulation of changes. The analaysis of genomes and their changes in sequence or size over time involves various fields. There are various mechanisms that have contributed to genome evolution and these include gene and genome duplications, polyploidy, mutation rates, transposable elements, pseudogenes, exon shuffling and genomic reduction and gene loss. The concepts of gene and whole-genome duplication are discussed as their own independent concepts, thus, the focus will be on other mechanisms.

Mutation Rates

Mutation rates differ between species and even between different regions of the genome of a single species. Spontaneous mutations often occur which can cause various changes in the genome. Mutations can result in the addition or deletion of one or more nucleotide bases. A change in the code can result in a frameshift mutation which causes the entire code to be read in the wrong order and thus often results in a protein becoming non-functional. A mutation in a promoter region, enhancer region or a region coding for transcription factors can also result in either a loss of function or and upregulation or downregulation in transcription of that gene. Mutations are constantly occurring in an organism’s genome and can cause either a negative effect, positive effect or no effect at all.


Chromosomal Mutations: Chromosomal mutations over time can accumulate and promote diversity and evolution if a produced trait is favorable.

Transposable Elements

Transposable elements are regions of DNA that can be inserted into the genetic code through one of two mechanisms. These mechanisms work similarly to “cut-and-paste” and “copy-and-paste” functionalities in word processing programs. The “cut-and-paste” mechanism works by excising DNA from one place in the genome and inserting itself into another location in the code. The “copy-and-paste” mechanism works by making a genetic copy or copies of a specific region of DNA and inserting these copies elsewhere in the code. The most common transposable element in the human genome is the Alu sequence, which is present in the genome over one million times.


Often a result of spontaneous mutation, pseudogenes are dysfunctional genes derived from previously functional gene relatives. There are many mechanisms by which a functional gene can become a pseudogene including the deletion or insertion of one or multiple nucleotides. This can result in a shift of reading frame, causing the gene to longer code for the expected protein, a premature stop codon or a mutation in the promoter region. Often cited examples of pseudogenes within the human genome include the once functional olfactory gene families. Over time, many olfactory genes in the human genome became pseudogenes and were no longer able to produce functional proteins, explaining the poor sense of smell humans possess in comparison to their mammalian relatives.

Exon Shuffling

Exon shuffling is a mechanism by which new genes are created. This can occur when two or more exons from different genes are combined together or when exons are duplicated. Exon shuffling results in new genes by altering the current intron-exon structure. This can occur by any of the following processes: transposon mediated shuffling, sexual recombination or illegitimate recombination. Exon shuffling may introduce new genes into the genome that can be either selected against and deleted or selectively favored and conserved.

Genome Reduction and Gene Loss

Many species exhibit genome reduction when subsets of their genes are not needed anymore. This typically happens when organisms adapt to a parasitic life style, e.g. when their nutrients are supplied by a host. As a consequence, they lose the genes need to produce these nutrients. In many cases, there are both free living and parasitic species that can be compared and their lost genes identified. Good examples are the genomes of Mycobacterium tuberculosis and Mycobacterium leprae, the latter of which has a dramatically reduced genome. Another beautiful example are endosymbiont species. For instance, Polynucleobacter necessarius was first described as a cytoplasmic endosymbiont of the ciliate Euplotes aediculatus. The latter species dies soon after being cured of the endosymbiont. In the few cases in which P. necessarius is not present, a different and rarer bacterium apparently supplies the same function. No attempt to grow symbiotic P. necessarius outside their hosts has yet been successful, strongly suggesting that the relationship is obligate for both partners. Yet, closely related free-living relatives of P. necessarius have been identified. The endosymbionts have a significantly reduced genome when compared to their free-living relatives (1.56 Mbp vs. 2.16 Mbp).

Whole-Genome Duplication

Whole-genome duplication is characterized by an organisms entire genetic information being copied once or multiple times.

Learning Objectives

State the evolutionary implications of whole-genome duplication

Key Takeaways

Key Points

  • Whole- genome duplication can provide an evolutionary advantage by providing the organism with multiple copies of a gene that is considered favorable.
  • Whole-genome duplication can result in divergence and formation of new species over time.
  • Whole-genome duplication can result in mutation and cause disease if the genes are rendered non-functional.

Key Terms

  • polyploidy: having more than the usual two homologous sets of chromosomes
  • palaeopolyploidization: the development of polyploid organisms in the geologic past
  • sympatric speciation: the process through which new species evolve from a single ancestral species while inhabiting the same geographic region

Whole-Genome Duplication

Gene duplication is the process by which a region of DNA coding for a gene creates additional copies of the gene. Similar to gene duplication, whole-genome duplication is the process by which an organism’s entire genetic information is copied, once or multiple times, which is known as polyploidy. This may provide an evolutionary benefit to the organism by supplying it with multiple copies of a gene, thus, creating a greater possibility of functional and selectively favored genes.


Polyploidy: This image shows haploid (single), diploid (double), triploid (triple), and tetraploid (quadruple) sets of chromosomes. Triploid and tetraploid chromosomes are examples of polyploidy.

Evolutionary importance

Paleopolyploidization events lead to massive cellular changes, including doubling of the genetic material, changes in gene expression and increased cell size. Gene loss during diploidization is not completely random, but heavily selected. Genes from large gene families are duplicated. On the other hand, individual genes are not duplicated. Overall, paleopolyploidy can have both short-term and long-term evolutionary effects on an organism’s fitness in the natural environment.

Genome diversity

Genome doubling provides organisms with redundant alleles that can evolve freely with little selection pressure. The duplicated genes can undergo neofunctionalization or subfunctionalization which could help the organism adapt to the new environment or survive different stress conditions.


Sympatric speciation can begin with a chromosomal error during meiosis or the formation of a hybrid individual with too many chromosomes, such as polyploidy which can occur during whole-genome duplication. Scientists have identified types of polyploidy that can lead to reproductive isolation of an individual in the polyploid state. In some cases a polyploid individual will have two or more complete sets of chromosomes from its own species in a condition called autopolyploidy. The other form of polyploidy occurs when individuals of two different species reproduce to form a viable offspring called an allopolyploid. The prefix “allo” means “other” (recall from allopatric); therefore, an allopolyploid occurs when gametes from two different species combine.

It has been suggested that many polyploidization events created new species, via a gain of adaptive traits, or by sexual incompatibility with their diploid counterparts. An example would be the recent speciation of allopolyploid Spartina — S. anglica; the polyploid plant is so successful that it is listed as an invasive species in many regions.

Evidence of Whole-Genome Duplication

In 1997, Wolfe & Shields gave evidence for an ancient duplication of the Saccharomyces cerevisiae (Yeast) genome. It was initially noted that this yeast genome contained many individual gene duplications. Wolfe & Shields hypothesized that this was actually the result of an entire genome duplication in the yeast’s distant evolutionary history. They found 32 pairs of homologous chromosomal regions, accounting for over half of the yeast’s genome. They also noted that although homologs were present, they were often located on different chromosomes. Based on these observations, they determined that Saccharomyces cerevisiae underwent a whole-genome duplication soon after its evolutionary split from Kluyveromyces, a genus of ascomycetous yeasts. Over time, many of the duplicate genes were deleted and rendered non-functional. A number of chromosomal rearrangements broke the original duplicate chromosomes into the current manifestation of homologous chromosomal regions.

Gene Duplications and Divergence

Gene duplications create genetic redudancy and can have various effects, including detrimental mutations or divergent evolution.

Learning Objectives

Explain the mechanisms of gene duplication and divergence

Key Takeaways

Key Points

  • Ectopic recombination occurs when there is an unequal crossing-over and the product of this recombination are a duplication at the site of the exchange and a reciprocal deletion.
  • Gene duplications do not always result in detrimental mutations; they can contribute to divergent evolution, which causes genetic differences between groups to develop and eventually form new species.
  • Replication slippage can occur when there is an error during DNA replication and duplications of short genetic sequences are produced.
  • Retrotranspositions occur when a retrovirus copies their genome by reverse transcribing RNA to DNA and aberrantly attach to cellular mRNA and reverse transcribe copies of genes to create retrogenes.
  • Aneuploidy can occur when there is a nondisjunction even at a single chromosome thus, the result is an abnormal number of chromosomes.
  • Genetic divergence can occur by mechanisms such as genetic drift which contibute to the accumulation of independent genetic changes of two or more populations derived from a common ancestor.

Key Terms

  • paralogous: having a similar structure indicating divergence from a common ancestral gene
  • nondisjunction: the failure of chromosome pairs to separate properly during meiosis
  • retrogene: a DNA gene copied back from RNA by reverse transcription
  • genetic drift: an overall shift of allele distribution in an isolated population, due to random fluctuations in the frequencies of individual alleles of the genes

Gene Duplication

Gene duplication is the process by which a region of DNA coding for a gene is copied. Gene duplication can occur as the result of an error in recombination or through a retrotransposition event. Duplicate genes are often immune to the selective pressure under which genes normally exist. This can result in a large number of mutations accumulating in the duplicate gene code. This may render the gene non-functional or in some cases confer some benefit to the organism. There are multiple mechanisms by which gene duplication can occur.

Ectopic Recombination

Duplications can arise from unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The product of this recombination is a duplication at the site of the exchange and a reciprocal deletion. Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats. Repetitive genetic elements, such as transposable elements, offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals.


Gene Duplication: This figure indicates a schematic of a region of a chromosome before and after a duplication event. Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats.

Replication Slippage

Replication slippage is an error in DNA replication, which can produce duplications of short genetic sequences. During replication, DNA polymerase begins to copy the DNA, and at some point during the replication process, the polymerase dissociates from the DNA and replication stalls. When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once. Replication slippage is also often facilitated by repetitive sequence but requires only a few bases of similarity.


During cellular invasion by a replicating retroelement or retrovirus, viral proteins copy their genome by reverse transcribing RNA to DNA. If viral proteins attach irregularly to cellular mRNA, they can reverse-transcribe copies of genes to create retrogenes. Retrogenes usually lack intronic sequence and often contain poly A sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions.


Aneuploidy occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes. Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions. Some aneuploid individuals are viable. For example, trisomy 21 in humans leads to Down syndrome, but it is not fatal. Aneuploidy often alters gene dosage in ways that are detrimental to the organism and therefore, will not likely spread through populations.

Gene duplication as an evolutionary event

Gene duplications are an essential source of genetic novelty that can lead to evolutionary innovation. Duplication creates genetic redundancy and if one copy of a gene experiences a mutation that affects its original function, the second copy can serve as a ‘spare part’ and continue to function correctly. Thus, duplicate genes accumulate mutations faster than a functional single-copy gene, over generations of organisms, and it is possible for one of the two copies to develop a new and different function. This is an examples of neofunctionalization.

Gene duplication is believed to play a major role in evolution; this stance has been held by members of the scientific community for over 100 years. It has been argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor.

Another possible fate for duplicate genes is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral “subfunctionalization” model, in which the functionality of the original gene is distributed among the two copies. Neither gene can be lost, as both now perform important non-redundant functions, but ultimately neither is able to achieve novel functionality. Subfunctionalization can occur through neutral processes in which mutations accumulate with no detrimental or beneficial effects. However, in some cases subfunctionalization can occur with clear adaptive benefits. If an ancestral gene is pleiotropic and performs two functions, often times neither one of these two functions can be changed without affecting the other function. In this way, partitioning the ancestral functions into two separate genes can allow for adaptive specialization of subfunctions, thereby providing an adaptive benefit.


Genetic divergence is the process in which two or more populations of an ancestral species accumulate independent genetic changes through time, often after the populations have become reproductively isolated for some period of time. In some cases, subpopulations living in ecologically distinct peripheral environments can exhibit genetic divergence from the remainder of a population, especially where the range of a population is very large. The genetic differences among divergent populations can involve silent mutations (that have no effect on the phenotype) or give rise to significant morphological and/or physiological changes. Genetic divergence will always accompany reproductive isolation, either due to novel adaptations via selection and/or due to genetic drift, and is the principal mechanism underlying speciation.

Genetic drift or allelic drift is the change in the frequency of a gene variant ( allele ) in a population due to random sampling. The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces. A population’s allele frequency is the fraction of the copies of one gene that share a particular form. Genetic drift may cause gene variants to disappear completely and thereby reduce genetic variation. When there are few copies of an allele, the effect of genetic drift is larger, and when there are many copies the effect is smaller. These changes in gene frequency can contribute to divergence.

Divergent evolution is usually a result of diffusion of the same species to different and isolated environments, which blocks the gene flow among the distinct populations allowing differentiated fixation of characteristics through genetic drift and natural selection.Divergent evolution can also be applied to molecular biology characteristics. This could apply to a pathway in two or more organisms or cell types. This can apply to genes and proteins, such as nucleotide sequences or protein sequences that are derived from two or more homologous genes. Both orthologous genes (resulting from a speciation event) and paralogous genes (resulting from gene duplication within a population) can be said to display divergent evolution.

Noncoding DNA

Noncoding DNA are sequences of DNA that do not encode protein sequences but can be transcribed to produce important regulatory molecules.

Learning Objectives

Summarize the importance of noncoding DNA

Key Takeaways

Key Points

  • In the human genome, over 98% of DNA is classified as noncoding DNA and can be transcribed to regulatory noncoding RNAs (i.e. tRNAs, rRNAs), origins of DNA replication, centromeres, telomeres and scaffold attachment regions (SARs).
  • Noncoding regions are most commonly referred to as ‘junk DNA’, however, this term is misleading as noncoding DNA does have functional importance.
  • The proportion of coding and noncoding DNA within organisms varies and the amount of noncoding DNA typically correlates with organism complexity, though there are many notable exceptions.

Key Terms

  • intergenic: describing the noncoding sections of nucleic acid between genes
  • noncoding: DNA which does not code for protein
  • intron: a portion of a split gene that is included in pre-RNA transcripts but is removed during RNA processing and rapidly degraded

Noncoding DNA

In genomics and related disciplines, noncoding DNA sequences are components of an organism’s DNA that do not encode protein sequences. Some noncoding DNA is transcribed into functional noncoding RNA molecules (e.g. transfer RNA, ribosomal RNA, and regulatory RNAs), while others are not transcribed or give rise to RNA transcripts of unknown function. The amount of noncoding DNA varies greatly among species. For example, over 98% of the human genome is noncoding DNA, while only about 2% of a typical bacterial genome is noncoding DNA.

Initially, a large proportion of noncoding DNA had no known biological function and was therefore sometimes referred to as “junk DNA”, particularly in the lay press. However, many types of noncoding DNA sequences do have important biological functions, including the transcriptional and translational regulation of protein-coding sequences, origins of DNA replication, centromeres, telomeres, scaffold attachment regions (SARs), genes for functional RNAs, and many others. Other noncoding sequences have likely, but as-yet undetermined, functions. Some sequences may have no biological function for the organism, such as endogenous retroviruses.

Genomic Variation between Organisms

The amount of total genomic DNA varies widely between organisms, and the proportion of coding and noncoding DNA within these genomes varies greatly as well. More than 98% of the human genome does not encode protein sequences, including most sequences within introns and most intergenic DNA. While overall genome size, and by extension the amount of noncoding DNA, are correlated to organism complexity, there are many exceptions. For example, the genome of the unicellular Polychaos dubium (formerly known as Amoeba dubia) has been reported to contain more than 200 times the amount of DNA in humans. The pufferfish Takifugu rubripes genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes; approximately 90% of the Takifugu genome is noncoding DNA.

In 2013, a new “record” for most efficient genome was discovered. Utricularia gibba, a bladderwort plant, has only 3% noncoding DNA. The extensive variation in nuclear genome size among eukaryotic species is known as the C-value enigma or C-value paradox. Most of the genome size difference appears to lie in the noncoding DNA. About 80 percent of the nucleotide bases in the human genome may be transcribed, but transcription does not necessarily imply function.


Utricularia gibba flower: Utricularia gibba has 3% noncoding DNA, which is low for flowering plants. This 3% has given this plant the title the ‘most efficient’ genome.

Variations in Size and Number of Genes

The genome size does not always correlate with the complexity of the organism and, in fact, shows great variation in size and gene number.

Learning Objectives

Describe how variations in the size and number of genes can arise through evolutionary mechanisms

Key Takeaways

Key Points

  • Harmless mutations and sexual recombination of chromosomes may allow the evolution of new characteristics.
  • Genome size can be affected by various events, including duplication, insertion, recombination, deletion and polyploidization events.
  • Genome size can be affected by evolution of an organism and result is an increased or decreased need for specific genes for survival based on behavior.
  • The human genome exemplifies the concept that complexity does not always correlate with an increase in genome size; there are fewer protein coding genes present than expected relative to the genome size.

Key Terms

  • polyploidization: hybridization that leads to polyploidy
  • pseudogene: a segment of DNA that is part of the genome of an organism, and which is similar to a gene but does not code for a gene product
  • genome: the cell’s complete genetic information packaged as a double-stranded DNA molecule

Variations in Size and Number of Genes

Genetic diversity refers to any variation in the nucleotides, genes, chromosomes, or whole genomes of organisms. Genetic diversity at its most elementary level is represented by differences in the sequences of nucleotides (adenine, cytosine, guanine, and thymine) that form the DNA (deoxyribonucleic acid) within the cells of the organism. The DNA is contained in the chromosomes present within the cell; some chromosomes are contained within specific organelles in the cell (for example, the chromosomes of mitochondria and chloroplast). Nucleotide variation is measured for discrete sections of the chromosomes, called genes. Thus, each gene compromises a hereditary section of DNA that occupies a specific place of the chromosome, and controls a particular characteristic of an organism.


Most organisms are diploid, having two sets of chromosomes, and therefore two copies (called alleles ) of each gene. However, some organisms can be haploid, triploid, or tetraploid (having one, three, or four sets of chromosomes respectively). Within any single organism, there may be variation between the two (or more) alleles for each gene. This variation is introduced either through mutation of one of the alleles, or as a result of sexual reproduction.

During sexual reproduction, offspring inherit alleles from both parents and these alleles might be slightly different, especially if there has been migration or hybridization of organisms, so that the parents may come from different populations and gene pools. Also, when the offspring’s chromosomes are copied after fertilization, genes can be exchanged in a process called sexual recombination. Harmless mutations and sexual recombination may allow the evolution of new characteristics.

Genome Size and Number

Genome size is usually measured in base pairs (or bases in single-stranded DNA or RNA). The C-value is another measure of genome size. The C-value refers to the amount, in picograms, of DNA contained within a haploid nucleus (e.g. a gamete) or one half the amount in a diploid somatic cell of a eukaryotic organism. In some cases (notably among diploid organisms), the terms C-value and genome size are used interchangeably, however in polyploids the C-value may represent two or more genomes contained within the same nucleus.

Different species can have different numbers of genes within the entire DNA or genome of the organism. However, a greater total number of genes might not correspond with a greater observable complexity in the anatomy and physiology of the organism (i.e. greater phenotypic complexity). For example, the predicted size of the human genome is not much larger than the genomes of some invertebrates and plants, and may even be smaller than the Indian rice genome. In humans, more proteins are encoded per gene than in other species. In prokaryotic genomes, research has shown that there is a significant positive correlation between the C-value of prokaryotes and the amount of genes that compose the genome. This indicates that gene number is the main factor influencing the size of the prokaryotic genome.

Genes vs Genome Size

In eukaryotic organisms, there is a paradox observed, namely that the number of genes that make up the genome does not correlate with genome size. In other words, the genome size is much larger than would be expected given the total number of protein coding genes. Genome size can increase by duplication, insertion, or polyploidization and the process of recombination can lead to both DNA loss or gain. It is also possible that genomes can shrink due to deletions.


Gene variation in the Genome: This figure represents the human genome, categorized by function of each gene product, given both as number of genes and as percentage of all genes. Importantly, genome size does not necessarily correlate with complexity.

A famous example for such gene decay is the genome of Mycobacterium leprae, the causative agent of leprosy. M.leprae has lost many once-functional genes over time due to the formation of pseudogenes. This is evident in looking at its closest ancestor Mycobacterium tuberculosis. M. leprae lives inside and replicates inside of a host and due to this arrangement it does not have a need for many of the genes it once carried which allowed it to live and prosper outside of the host. Thus over time these genes have lost their function through mechanisms such as mutation causing them to become pseudogenes. It is beneficial to an organism to rid itself of non-essential genes because it makes replicating its DNA much faster and more energy-efficient.

An example of increasing genome size over time is seen in filamentous plant pathogens. These plant pathogen genomes have been growing larger over the years due to repeat-driven expansion. The repeat-rich regions contain genes coding for host interaction proteins. With the addition of more and more repeats to these regions the plants increase the possibility of developing new virulence factors through mutation and other forms of genetic recombination. In this way it is beneficial for these plant pathogens to have larger genomes.