Genome Evolution

Gene Families

Gene families are groups of functionally related genes arising from a duplicated gene.

Learning Objectives

Distinguish gene families from gene complexes

Key Takeaways

Key Points

  • Genes duplicate over evolutionary times. As they duplicate this can lead to families of related genes. Since they come from the same progenitor gene, they often have related biochemical functions.
  • Gene families can expand and contract over evolutionary time scales.
  • Gene complexes are groups of genes that work in a fashion similar to gene families; however, they need not arise from gene duplication.

Key Terms

  • phylogenetic: Of or relating to the evolutionary development of organisms.
  • secondary structure: The general three-dimensional structure of a biopolymer such as DNA or a protein.

A gene family is a set of several similar genes, formed by duplication of a single original gene, that generally have similar biochemical functions. One such family are the genes for human haemoglobin subunits. The 10 genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. Genes are categorized into families based on shared nucleotide or protein sequences. Phylogenetic techniques can be used as a more rigorous test. The positions of exons within the coding sequence can be used to infer common ancestry. Knowing the sequence of the protein encoded by a gene can allow researchers to apply methods that find similarities among protein sequences that provide more information than similarities or differences among DNA sequences. Furthermore, knowledge of the protein’s secondary structure gives further information about ancestry, since the organization of secondary structural elements presumably would be conserved even if the amino acid sequence changes considerably.

image

Evolution of a Gene Family: Unequal crossing over generates gene families. The left side illustrates an unequal crossing over event and the two products that are generated. One product is deleted and the other is duplicated for the same region. In this example, the duplicated region contains a second complete copy of a single gene (B). The right side illustrates a second round of unequal crossing over that can occur in a genome that is homozygous of the original duplicated chromosome. In this case, the crossover event has occurred between the two copies of the original gene. Only the duplicated product generated by this event is shown. Over time, the three copies of the B gene can diverge into three distinct functional units (B1, B2, and B3) of a gene family cluster.

These methods often rely upon predictions based upon the DNA sequence. If the genes of a gene family encode proteins, the term protein family is often used in an analogous manner to gene family. The expansion or contraction of gene families along a specific lineage can be due to chance or can be the result of natural selection. To distinguish between these two cases is often difficult in practice. Recent work uses a combination of statistical models and algorithmic techniques to detect gene families that are under the effect of natural selection.

In contrast, gene complexes are simply tightly linked groups of genes, often created via gene duplication (sometimes called segmental duplication if the duplicates remain side-by-side). Here, each gene has a similar though slightly diverged function.

Genomics and Biofuels

Microbial genomics can be used to create new biofuels.

Learning Objectives

Explain the process of creating new biofuels by using microbial genomics

Key Takeaways

Key Points

  • Microorganisms can encode new enzymes and produce new organic compounds that can be used as biofuels.
  • Genomic analysis of the fungus Pichia will allow optimization of its use in fermenting ethanol fuels.
  • Analysis of the microbes in the hindgut of termites have found 500 genes that may be useful in enzymatic destruction of cellulose.
  • Genetic markers have been used in forensic analysis, like in 2001 when the FBI used microbial genomics to determine a specific strain of anthrax that was found in several pieces of mail.
  • Genomics is used in agriculture to develop plants with more desirable traits, such as drought and disease resistance.

Key Terms

  • renewable resource: a natural resource such that it is replenished by natural processes at a rate comparable to its rate of consumption by humans or other users
  • biofuel: any fuel that is obtained from a renewable biological resource

Knowledge of the genomics of microorganisms is being used to find better ways to harness biofuels from algae and cyanobacteria. The primary sources of fuel today are coal, oil, wood, and other plant products, such as ethanol. Although plants are renewable resources, there is still a need to find more alternative renewable sources of energy to meet our population ‘s energy demands. The microbial world is one of the largest resources for genes that encode new enzymes and produce new organic compounds, and it remains largely untapped.

For microbial biomass breakdown, many candidates have already been identified. These include Clostridia species for their ability to degrade cellulose, and fungi that express genes associated with the decomposition of the most recalcitrant features of the plant cell wall, lignin, the phenolic “glue” that imbues the plant with structural integrity and pest resistance. The white rot fungus Phanerochaete chrysosporium produces unique extracellular oxidative enzymes that effectively degrade lignin by gaining access through the protective matrix surrounding the cellulose microfibrils of plant cell walls.

Another fungus, the yeast Pichia stipitis, ferments the five-carbon “wood sugar” xylose abundant in hardwoods and agricultural harvest residue. Pichia‘s recently-sequenced genome has revealed insights into the metabolic pathways responsible for this process, guiding efforts to optimize this capability in commercial production strains. Pathway engineering promises to produce a wider variety of organisms able to ferment the full repertoire of sugars derived from cellulose and hemicellulose and tolerate higher ethanol concentrations to optimize fuel yields. For instance, the hindgut contents of nature’s own bioreactor, the termite, has yielded more than 500 genes related to the enzymatic deconstruction of cellulose and hemicellulose.

image

Termites: Nature’s Bioreactors: The hindgut of the termite has yielded more than 500 genes of microbes related to the enzymatic deconstruction of cellulose.

Genome Reduction

Genome reduction is the loss of genome size of a species in comparison to its ancestors.

Learning Objectives

Review genome reduction

Key Takeaways

Key Points

  • Genomes tend to increase in size over time, however exceptions occur.
  • Genome reduction is observed in species that depend on a host for survival, the most extreme examples are eukaryotic organelles that have bacterial origins such as mitochondria.
  • As an endosymbiont becomes dependent on its host, it becomes an obligate endosymbiont; during this time the genome is reduced due to deletions of genes not needed to live in the host. Parts of the genome can be transferred to the host as well, leading to further genome reduction.

Key Terms

  • genome: The complete genetic information (either DNA or, in some viruses, RNA) of an organism, typically expressed in the number of basepairs.
  • endosymbiont: An organism that lives within the body or cells of another organism.

Genome size is the total amount of DNA contained within one copy of a single genome. Over evolutionary times, genomes tend to increase in size due to the accumulation of duplication of the genome and an increase in genetic elements. The opposite or genome reduction also occurs. Genome reduction, also known as genome degradation, is the process by which a genome shrinks relative to its ancestor. Genomes fluctuate in size regularly; however, genome size reduction is most significant in bacteria.The most evolutionary significant cases of genome reduction may be the eukaryotic organelles that are derived from bacteria: the mitochondrion and plastid. These organelles are descended from endosymbionts, which can only survive within the host cell and which the host cell likewise needs for survival. Many mitochondria have less than 20 genes in their entire genome, whereas a free-living bacterium generally has at least 1000 genes. Many genes have been transferred to the host nucleus, while others have simply been lost and their function replaced by host processes.Other bacteria have become endosymbionts or obligate intracellular pathogens and have experienced extensive genome reduction as a result. This process seems to be dominated by genetic drift resulting from small population size, low recombination rates, and high mutation rates, as opposed to selection for smaller genomes. Some free-living marine bacterioplanktons also shows signs of genome reduction, which are hypothesized to be driven by natural selection.

image

Variation in genome sizes: A graph show the relative size of genomes, generally more “complex” organisms have larger genomes.

Obligate endosymbiotic species are characterized by a complete inability to survive outside their host environment. These species have become a considerable threat to human health, as they are often highly capable of evading human immune systems and manipulating the host environment to acquire nutrients. A common explanation for these keen manipulative abilities is the compact and efficient genomic structure consistently found in obligate endosymbionts. This compact genome structure is the result of massive losses of extraneous DNA – an occurrence that is exclusively associated with the loss of a free-living stage. In fact, as much as 90% of the genetic material can be lost when a species makes the evolutionary transition from a free-living to obligate intracellular lifestyle. Common examples of species with reduced genomes include: Buchnera aphidicola, Rickettsia prowazekii and Mycobacterium leprae. One obligate endosymbiont of psyllid, Candidatus Carsonella ruddii, has the smallest genome currently known among cellular organisms at 160kb. It is important to note, however, that some obligate intracellular species have positive fitness effects on their hosts. The reductive evolution model has been proposed as an effort to define the genomic commonalities seen in all obligate endosymbionts. This model illustrates four general features of reduced genomes and obligate intracellular species: ‘genome streamlining’ resulting from relaxed selection on genes that are superfluous in the intracellular environment; a bias towards deletions (rather than insertions), which heavily affects genes that have been disrupted by accumulation of mutations (pseudogenes); very little or no capability for acquiring new DNA; and considerable reduction of effective population size in endosymbiotic populations, particularly in species that rely on vertical transmission. Based on this model, it is clear that endosymbionts face different adaptive challenges than free-living species.

Pathogenicity Islands

Pathogenenicity islands are discrete genetic loci that encode factors which make a microbe more virulent.

Learning Objectives

Describe pathogenicity islands

Key Takeaways

Key Points

  • A host may have more than one pathogenicity island.
  • Pathogenicity islands are transferred horizontally, through plasmids or transposons.
  • The addition of a pathogenicity island to a non-invasive species can make the non-invasive species pathogenic.

Key Terms

  • horizontal gene transfer: The transfer of genetic material from one organism to another one that is not its offspring; especially common among bacteria.
  • transposon: A segment of DNA that can move to a different position within a genome.
  • integrase: Any enzyme that integrates viral DNA into that of an infected cell.
image

Horizontal Transfer: Pathogenicity islands are transferred horizontally, this details some of the ways that occurs.

Pathogenicity islands (PAIs) are a distinct class of genomic islands acquired by microorganisms through horizontal gene transfer. They are incorporated in the genome of pathogenic organisms, but are usually absent from those nonpathogenic organisms of the same or closely related species. These mobile genetic elements may range from 10-200 kb and encode genes which contribute to the virulence of the respective pathogen. Typical examples are adherence factors, toxins, iron uptake systems, invasion factors, and secretion systems. Pathogenicity islands are discrete genetic units flanked by direct repeats, insertion sequences or tRNA genes, which act as sites for recombination into the DNA. Cryptic mobility genes may also be present, indicating the provenance as transduction. One species of bacteria may have more than one PAI (i.e. Salmonella has at least 5). They are transferred through horizontal gene transfer events such as transfer by a plasmid, phage, or conjugative transposon.

Pathogenicity islands carry genes encoding one or more virulence factors, including, but not limited to, adhesins, toxins, or invasins. They may be located on a bacterial chromosome or may be transferred within a plasmid. The GC-content of pathogenicity islands often differs from that of the rest of the genome, potentially aiding in their detection within a given DNA sequence. PAIs are flanked by direct repeats; the sequence of bases at two ends of the inserted sequence is the same. They carry functional genes, such as integrases, transposases, or part of insertion sequences, to enable insertion into host DNA. PAIs are often associated with tRNA genes, which target sites for this integration event. They can be transferred as a single unit to new bacterial cells, thus conferring virulence to formerly benign strains.