Prokaryotic Genomes

Bacterial Chromosomes in the Nucleoid

The nucleoid is an irregularly-shaped region within the cell of a prokaryote that contains all or most of the genetic material.

Learning Objectives

Evaluate the nucleoid in prokaryotes

Key Takeaways

Key Points

  • The genome of prokaryotic organisms generally is a circular, double-stranded piece of DNA, multiple copies of which may exist at any time.
  • The length of a genome varies widely, but is generally at least a few million base pairs.
  • A genophore is the DNA of a prokaryote. It is commonly referred to as a prokaryotic chromosome.

Key Terms

  • nucleoid: The irregularly-shaped region within a prokaryote cell where the genetic material is localized.
  • prokaryote: An organism characterized by the absence of a nucleus or any other membrane-bound organelles.
  • genome: The complete genetic information (either DNA or, in some viruses, RNA) of an organism, typically expressed in the number of basepairs.

The Nucleoid

The nucleoid (meaning nucleus-like) is an irregularly-shaped region within the cell of a prokaryote that contains all or most of the genetic material. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. The genome of prokaryotic organisms generally is a circular, double-stranded piece of DNA, of which multiple copies may exist at any time. The length of a genome varies widely, but is generally at least a few million base pairs.

image

Prokaryote cell nucleoid: Prokaryote cell (right) showing the nucleoid in comparison to a eukaryotic cell (left) showing the nucleus.

The nucleoid can be clearly visualized on an electron micrograph at high magnification, where it is clearly visible against the cytosol. Sometimes even strands of what is thought to be DNA are visible. The nucleoid can also be seen under a light microscope.by staining it with the Feulgen stain, which specifically stains DNA. The DNA-intercalating stains DAPI and ethidium bromide are widely used for fluorescence microscopy of nucleoids.

Experimental evidence suggests that the nucleoid is largely composed of about 60% DNA, plus a small amount of RNA and protein. The latter two constituents are likely to be mainly messenger RNA and the transcription factor proteins found regulating the bacterial genome. Proteins helping to maintain the supercoiled structure of the nucleic acid are known as nucleoid proteins or nucleoid-associated proteins, and are distinct from histones of eukaryotic nuclei. In contrast to histones, the DNA-binding proteins of the nucleoid do not form nucleosomes, in which DNA is wrapped around a protein core. Instead, these proteins often use other mechanisms, such as DNA looping, to promote compaction.

The Genophore

A genophore is the DNA of a prokaryote. It is commonly referred to as a prokaryotic chromosome. The term “chromosome” is misleading, because the genophore lacks chromatin. The genophore is compacted through a mechanism known as supercoiling, but a chromosome is additionally compacted through the use of chromatin. The genophore is circular in most prokaryotes, and linear in very few. The circular nature of the genophore allows replication to occur without telomeres. Genophores are generally of a much smaller size than Eukaryotic chromosomes. A genophore can be as small as 580,073 base pairs (Mycoplasma genitalium). Many eukaryotes (such as plants and animals) carry genophores in organelles such as mitochondria and chloroplasts. These organelles are very similar to true prokaryotes.

Supercoiling

DNA supercoiling refers to the over- or under-winding of a DNA strand, and is an expression of the strain on that strand.

Learning Objectives

Assess the role of supercoiling in prokaryotic genomes

Key Takeaways

Key Points

  • As a general rule, the DNA of most organisms is negatively supercoiled.
  • The simple figure eight is the simplest supercoil, and is the shape a circular DNA assumes to accommodate one too many or one too few helical twists.
  • DNA supercoiling is important for DNA packaging within all cells.

Key Terms

  • supercoiling: The coiling of the DNA helix upon itself; can cause disruption to transcription and lead to cell death.
  • DNA: A biopolymer of deoxyribonucleic acids (a type of nucleic acid) that has four different chemical groups, called bases: adenine, guanine, cytosine, and thymine.
  • chromosome: A structure in the cell nucleus that contains DNA, histone protein, and other structural proteins.

DNA supercoiling refers to the over- or under-winding of a DNA strand, and is an expression of the strain on that strand. Supercoiling is important in a number of biological processes, such as compacting DNA. Additionally, certain enzymes such as topoisomerases are able to change DNA topology to facilitate functions such as DNA replication or transcription. Mathematical expressions are used to describe supercoiling by comparing different coiled states to relaxed B-form DNA.

image

Supercoiled Structure of Circular DNA: This is a supercoiled structure of circular DNA molecules with low writhe. Note that the helical nature of the DNA duplex is omitted for clarity.

As a general rule, the DNA of most organisms is negatively supercoiled.

In a “relaxed” double-helical segment of B-DNA, the two strands twist around the helical axis once every 10.4 to 10.5 base pairs of sequence. Adding or subtracting twists, as some enzymes can do, imposes strain. If a DNA segment under twist strain were closed into a circle by joining its two ends and then allowed to move freely, the circular DNA would contort into a new shape, such as a simple figure-eight. Such a contortion is a supercoil.

The simple figure eight is the simplest supercoil, and is the shape a circular DNA assumes to accommodate one too many or one too few helical twists. The two lobes of the figure eight will appear rotated either clockwise or counterclockwise with respect to one another, depending on whether the helix is over or underwound. For each additional helical twist being accommodated, the lobes will show one more rotation about their axis.

The noun form “supercoil” is rarely used in the context of DNA topology. Instead, global contortions of a circular DNA, such as the rotation of the figure-eight lobes above, are referred to as writhe. The above example illustrates that twist and writhe are interconvertible. “Supercoiling” is an abstract mathematical property representing the sum of twist and writhe. The twist is the number of helical turns in the DNA and the writhe is the number of times the double helix crosses over on itself (these are the supercoils).

Extra helical twists are positive and lead to positive supercoiling, while subtractive twisting causes negative supercoiling. Many topoisomerase enzymes sense supercoiling and either generate or dissipate it as they change DNA topology. DNA of most organisms is negatively supercoiled.

In part because chromosomes may be very large, segments in the middle may act as if their ends are anchored. As a result, they may be unable to distribute excess twist to the rest of the chromosome or to absorb twist to recover from underwinding—the segments may become supercoiled, in other words. In response to supercoiling, they will assume an amount of writhe, just as if their ends were joined.

Supercoiled DNA forms two structures; a plectoneme or a toroid, or a combination of both. A negatively supercoiled DNA molecule will produce either a one-start left-handed helix, the toroid, or a two-start right-handed helix with terminal loops, the plectoneme. Plectonemes are typically more common in nature, and this is the shape most bacterial plasmids will take. For larger molecules, it is common for hybrid structures to form – a loop on a toroid can extend into a plectoneme. If all the loops on a toroid extend, it becomes a branch point in the plectonemic structure.

The Importance of DNA supercoiling

DNA supercoiling is important for DNA packaging within all cells. Because the length of DNA can be thousands of times that of a cell, packaging this genetic material into the cell or nucleus (in eukaryotes ) is a difficult feat. Supercoiling of DNA reduces the space and allows for much more DNA to be packaged. In prokaryotes, plectonemic supercoils are predominant, because of the circular chromosome and relatively small amount of genetic material. In eukaryotes, DNA supercoiling exists on many levels of both plectonemic and solenoidal supercoils, with the solenoidal supercoiling proving the most effective in compacting the DNA. Solenoidal supercoiling is achieved with histones to form a 10 nm fiber. This fiber is further coiled into a 30 nm fiber, and further coiled upon itself numerous times more.

DNA packaging is greatly increased during nuclear division events such as mitosis or meiosis, where DNA must be compacted and segregated to daughter cells. Condensins and cohesins are structural maintenance of chromosome (SMC) proteins that aid in the condensation of sister chromatids and the linkage of the centromere in sister chromatids. These SMC proteins induce positive supercoils.

Supercoiling is also required for DNA and RNA synthesis. Because DNA must be unwound for DNA and RNA polymerase action, supercoils will result. The region ahead of the polymerase complex will be unwound; this stress is compensated with positive supercoils ahead of the complex. Behind the complex, DNA is rewound and there will be compensatory negative supercoils. It is important to note that topoisomerases such as DNA gyrase (Type II Topoisomerase) play a role in relieving some of the stress during DNA and RNA synthesis.

Size Variation and ORF Contents in Genomes

An open reading frame (ORF) is the part of a reading frame that varies in size and content in bacterial genomes.

Learning Objectives

Explain prokaryotic genome size variation and ORFs

Key Takeaways

Key Points

  • Open reading frames are used as one piece of evidence to assist in gene prediction.
  • If a portion of a genome has been sequenced, ORFs can be located by examining each of the three possible reading frames on each strand.
  • Bacterial genomes display variation in size, even among strains of the same species.

Key Terms

  • gene: A unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next. It carries genetic information such as the sequence of amino acids for a protein.
  • codons: The genetic code is the set of rules by which information encoded within genetic material (DNA or mRNA sequences) is translated into proteins (amino acid sequences) by living cells. Biological decoding is accomplished by the ribosome, which links amino acids in an order specified by mRNA, using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms, and can be expressed in a simple table with 64 entries.
  • open reading frame: A sequence of DNA triplets, between the initiator and terminator codons, that can be transcribed into mRNA and later translated into protein.

In molecular genetics, an open reading frame (ORF) is the part of a reading frame that contains no stop codons. The transcription termination pause site is located after the ORF, beyond the translation stop codon, because if transcription were to cease before the stop codon, an incomplete protein would be made during translation.

Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop codons.

Open reading frames are used as one piece of evidence to assist in gene prediction. Long ORFs are often used, along with other evidence, to initially identify candidate protein coding regions in a DNA sequence. The presence of an ORF does not necessarily mean that the region is ever translated. For example, in a randomly generated DNA sequence with an equal percentage of each nucleotide, a stop-codon would be expected once every 21 codons. A simple gene prediction algorithm for prokaryotes might look for a start codon followed by an open reading frame that is long enough to encode a typical protein, where the codon usage of that region matches the frequency characteristic for the given organism ‘s coding regions. Even a long open reading frame by itself is not conclusive evidence for the presence of a gene.

image

Open Reading Frames: Frame +1 is the ORF predicted in the database to encode a protein. +2 and +3 are the other two potential ORFs in the same strand and -1, -2, and -3 are the three potential ORFs in the antisense strand.

If a portion of a genome has been sequenced (e.g. 5′-ATCTAAAATGGGTGCC-3′), ORFs can be located by examining each of the three possible reading frames on each strand. In this sequence two out of three possible reading frames are entirely open, meaning that they do not contain a stop codon:

…A TCT AAA ATG GGT GCC…

…AT CTA AAA TGG GTG CC…

…ATC TAA AAT GGG TGC C…

Possible stop codons in DNA are “TGA”, “TAA”, and “TAG”. Thus, the last reading frame in this example contains a stop codon (TAA), unlike the first two.

Bacterial genomes display variation in size, even among strains of the same species. These microorganisms have very little noncoding or repetitive DNA, as the variation in their genome size usually reflects differences in gene repertoire. Some species, particularly bacterial parasites and symbionts, have undergone massive genome reduction and simply contain a subset of the genes present in their ancestors.

However, in free-living bacteria, such gene loss cannot explain the observed disparities in genome size because ancestral genomes would have had to contain improbably large numbers of genes. Surprisingly, a substantial fraction of the difference in gene contents in free-living bacteria is due to the presence of ORFans, that is, open reading frames (ORFs) that have no known homologs and are consequently of no known function.

The high numbers of ORFans in bacterial genomes indicate that, with the exception of those species with highly reduced genomes, much of the observed diversity in gene inventories does not result from either the loss of ancestral genes or the transfer from well-characterized organisms (processes that result in a patchy distribution of orthologs but not in unique genes) or from recent duplications (which would likely yield homologs within the same or closely related genome).

Bioinformatic Analyses and Gene Distributions

Bioinformatics is the study of methods for storing, retrieving and analyzing biological data.

Learning Objectives

Describe the purposes and applications of bioinformatics

Key Takeaways

Key Points

  • The primary goal of bioinformatics is to increase the understanding of biological processes.
  • Bioinformatics entails the creation and advancement of databases, algorithms, computational and statistical techniques and theory to solve problems arising from the management and analysis of biological data.
  • Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species.

Key Terms

  • bioinformatics: Bioinformatics is a branch of biological science which deals with the study of methods for storing, retrieving and analyzing biological data like nucleic acid (DNA/RNA) and protein sequence, structure, function, pathways and genetic interactions.
  • algorithms: In mathematics and computer science, an algorithm is a step-by-step procedure for calculations. Algorithms are used for calculation, data processing, and automated reasoning.
  • gene ontology: The Gene Ontology is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species.

Bioinformatics is a branch of biological science dealing with the study of storing, retrieving and analyzing biological data like nucleic acid (DNA/RNA) and protein sequence, structure, function, pathways and genetic interactions. It generates new knowledge that is useful in such fields as drug design and development of new software tools. Bioinformatics also deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, structural biology, software engineering, data mining, image processing, modeling and simulation, discrete mathematics, control and system theory, circuit theory, and statistics.

image

Map of the human X chromosome: Assembly of the human genome is one of the greatest achievements of bioinformatics

At the beginning of the “genomic revolution,” the term bioinformatics refered to the creation and maintenance of a database to store biological information like nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could access existing data as well as submit new or revised data.

In order to study how normal cellular activities are altered in different disease states, the biological data must be combined to form a comprehensive picture of these activities. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. This includes nucleotide and amino acid sequences, protein domains and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:

  • the development of tools that enable efficient use of various types of information
  • the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets. For example, methods to locate a gene within a sequence (gene distributions), predict protein structure and/or function, and cluster protein sequences into families of related sequences.

The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques to achieve this goal. Examples include pattern recognition, data mining, machine learning algorithms, and visualization. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, and the modeling of evolution.

Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to:

  • maintain and develop its controlled vocabulary of gene and gene product attributes
  • annotate genes and gene products and assimilate and disseminate annotation data
  • offer tools for easy access to all aspects of the data provided by the project