Whole-Genome Sequencing

Strategies Used in Sequencing Projects

The strategies used for sequencing genomes include the Sanger method, shotgun sequencing, pairwise end, and next-generation sequencing.

Learning Objectives

Compare the different strategies used for whole-genome sequencing:  Sanger method, shotgun sequencing, pairwise-end sequencing, and next-generation sequencing

Key Takeaways

Key Points

  • The Sanger method is a basic sequencing technique that uses fluorescently-labeled dideoxynucleotides (ddNTPs) during DNA replication which results in multiple short strands of replicated DNA that terminate at different points, based on where the ddNTP was incorporated.
  • Shotgun sequencing is a method that randomly cuts DNA fragments into smaller pieces and then, with the help of a computer, takes the DNA fragments, analyzes them for overlapping sequences, and reassembles the entire DNA sequence.
  • Pairwise-end sequencing is a type of shotgun sequencing which is used for larger genomes and analyzes both ends of the DNA fragments for overlap.
  • Next-generation sequencing is a type of sequencing which is automated and relies on sophisticated software for rapid DNA sequencing.

Key Terms

  • fluorophore: a molecule or functional group which is capable of fluorescence
  • contig: a set of overlapping DNA segments, derived from a single source of genetic material, from which the complete sequence may be deduced
  • dideoxynucleotide: any nucleotide formed from a deoxynucleotide by loss of an a second hydroxyl group from the deoxyribose group

Strategies Used in Sequencing Projects

The basic sequencing technique used in all modern day sequencing projects is the chain termination method (also known as the dideoxy method), which was developed by Fred Sanger in the 1970s. The chain termination method involves DNA replication of a single-stranded template with the use of a primer and a regular deoxynucleotide (dNTP), which is a monomer, or a single unit, of DNA. The primer and dNTP are mixed with a small proportion of fluorescently-labeled dideoxynucleotides (ddNTPs). The ddNTPs are monomers that are missing a hydroxyl group (–OH) at the site at which another nucleotide usually attaches to form a chain. Each ddNTP is labeled with a different color of fluorophore. Every time a ddNTP is incorporated in the growing complementary strand, it terminates the process of DNA replication, which results in multiple short strands of replicated DNA that are each terminated at a different point during replication. When the reaction mixture is processed by gel electrophoresis after being separated into single strands, the multiple, newly-replicated DNA strands form a ladder due to their differing sizes. Because the ddNTPs are fluorescently labeled, each band on the gel reflects the size of the DNA strand and the ddNTP that terminated the reaction. The different colors of the fluorophore-labeled ddNTPs help identify the ddNTP incorporated at that position. Reading the gel on the basis of the color of each band on the ladder produces the sequence of the template strand.

image

Sanger’s Method: Frederick Sanger’s dideoxy chain termination method uses dideoxynucleotides, in which the DNA fragment can be terminated at different points. The DNA is separated on the basis of size, and these bands, based on the size of the fragments, can be read.

image

Structure of a Dideoxynucleotide: A dideoxynucleotide is similar in structure to a deoxynucleotide, but is missing the 3′ hydroxyl group (indicated by the box). When a dideoxynucleotide is incorporated into a DNA strand, DNA synthesis stops.

Early Strategies: Shotgun Sequencing and Pair-Wise End Sequencing

In the shotgun sequencing method, several copies of a DNA fragment are cut randomly into many smaller pieces (somewhat like what happens to a round shot cartridge when fired from a shotgun). All of the segments are then sequenced using the chain-sequencing method. Then, with the help of a computer, the fragments are analyzed to see where their sequences overlap. By matching overlapping sequences at the end of each fragment, the entire DNA sequence can be reformed. A larger sequence that is assembled from overlapping shorter sequences is called a contig. As an analogy, consider that someone has four copies of a landscape photograph that you have never seen before and know nothing about how it should appear. The person then rips up each photograph with their hands, so that different size pieces are present from each copy. The person then mixes all of the pieces together and asks you to reconstruct the photograph. In one of the smaller pieces you see a mountain. In a larger piece, you see that the same mountain is behind a lake. A third fragment shows only the lake, but it reveals that there is a cabin on the shore of the lake. Therefore, from looking at the overlapping information in these three fragments, you know that the picture contains a mountain behind a lake that has a cabin on its shore. This is the principle behind reconstructing entire DNA sequences using shotgun sequencing.

Originally, shotgun sequencing only analyzed one end of each fragment for overlaps. This was sufficient for sequencing small genomes. However, the desire to sequence larger genomes, such as that of a human, led to the development of double-barrel shotgun sequencing, more formally known as pairwise-end sequencing. In pairwise-end sequencing, both ends of each fragment are analyzed for overlap. Pairwise-end sequencing is, therefore, more cumbersome than shotgun sequencing, but it is easier to reconstruct the sequence because there is more available information.

Next-generation Sequencing

Since 2005, automated sequencing techniques used by laboratories are under the umbrella of next-generation sequencing, which is a group of automated techniques used for rapid DNA sequencing. These automated, low-cost sequencers can generate sequences of hundreds of thousands or millions of short fragments (25 to 500 base pairs) in the span of one day. Sophisticated software is used to manage the cumbersome process of putting all the fragments in order.

Use of Whole-Genome Sequences of Model Organisms

Sequencing genomes of model organisms allows scientists to study homologous proteins in more complex eukaryotes, such as humans.

Learning Objectives

Describe the model organisms used in whole-genome sequencing

Key Takeaways

Key Points

  • The first genome to be completely sequenced was the bacterial virus, bacteriophage fx174, which is 5368 base pairs.
  • Scientists utilize genome sequencing from model organisms to study homologous proteins and establish evolutionary relationships.
  • Genome annotation is the process of attaching biological information to gene sequences identified using whole-genome sequencing.
  • Model organisms include the fruit fly (Drosophila melanogaster), brewers yeast (Saccharomyces cerevisiae), the nematode, Caenorhabditis elegans, and the mouse (Mus musculus).

Key Terms

  • genome annotation: the process of attaching biological information to gene sequences.
  • model organism: any organism (e.g. the fruit fly) that has been extensively studied as an example of many others and from which general principles may be established

Use of Whole-Genome Sequences of Model Organisms

The first genome to be completely sequenced was of a bacterial virus, the bacteriophage fx174 (5368 base pairs). This was accomplished by Fred Sanger using shotgun sequencing. Several other organelle and viral genomes were later sequenced. The first organism whose genome was sequenced was the bacterium Haemophilus influenzae, which was accomplished by Craig Venter in the 1980s. Approximately 74 different laboratories collaborated on the sequencing of the genome of the yeast Saccharomyces cerevisiae, which began in 1989 and was completed in 1996. It took this long because it was 60 times bigger than any other genome that had been sequenced at that point. By 1997, the genome sequences of two important model organisms were available: the bacterium Escherichia coli K12 and the yeast Saccharomyces cerevisiae. Genomes of other model organisms, such as the mouse Mus musculus, the fruit fly Drosophila melanogaster, the nematode Caenorhabditis elegans, and the human Homo sapiens are now known. Much basic research is performed using model organisms because the information can be applied to the biological processes of genetically-similar organisms. Having entire genomes sequenced aids these research efforts.

The process of attaching biological information to gene sequences is called genome annotation. Annotation aids researchers doing basic experiments in molecular biology, such as designing PCR primers and RNA targets.

Sequencing genomes allows scientists to identify homologous proteins and establish evolutionary relationships. Furthermore, if a newly-discovered protein is homologous to a known protein, through homology, scientists can make an educated guess as to how the new protein functions.

Eukaryotes are organisms containing cells that enclose complex organelles within a well-defined cell membrane. The defining characteristic that sets eukaryotes and prokaryotes apart is the eukaryotes’ nucleus, or nuclear envelope, in which an organism’s genetic information is contained. The first eukaryotic genome to be sequenced was that of S. cerevisiae, which is the yeast used in baking and brewing. It is the most-studied eukaryotic model organism in molecular and cell biology, similar to E. coli‘s role in the study of prokaryotic organisms. Research on many proteins that are important to humans is done by examining their homologs in yeasts. For example, signaling proteins and protein-processing enzymes were discovered through the help of yeast genome.

image

S. cerevisiae: A Model Organism for Eukaryotes: Saccharomyces cerevisiae, a yeast, is used as a model organism for studying signaling proteins and protein-processing enzymes which have homologs in humans.

Uses of Genome Sequences

Genome sequences and expression can be analyzed using DNA microarrays, which can contribute to detection of disease and genetic disorders.

Learning Objectives

Describe the various uses of genome sequences

Key Takeaways

Key Points

  • DNA microarrays can be used to detect gene expression within specific samples by analyzing active genes and sequences using an array of DNA fragments fixed to a slide.
  • Genome sequences can be used to discover the possibility of disease and genetic disorders prior to onset.
  • Genome sequences can also be used to develop agrochemicals and pharmaceuticals.

Key Terms

  • microarray: any of several devices containing a two-dimensional array of small quantities of biological material used for various types of assays
  • genomics: the study of the complete genome of an organism

Uses of Genome Sequences

DNA microarrays are methods used to detect gene expression by analyzing an array of DNA fragments that are fixed to a glass slide or a silicon chip to identify active genes and identify sequences. Almost one million genotypic abnormalities can be discovered using microarrays, whereas whole- genome sequencing can provide information about all six billion base pairs in the human genome. Although the study of medical applications of genome sequencing is interesting, this discipline tends to dwell on abnormal gene function. Knowledge of the entire genome will allow future onset diseases and other genetic disorders to be discovered early, which will allow for more informed decisions to be made about lifestyle, medication, and having children. Genomics is still in its infancy, although someday it may become routine to use whole-genome sequencing to screen every newborn to detect genetic abnormalities.

image

DNA Microarray: DNA microarrays can be used to analyze gene expression within the genome.

In addition to disease and medicine, genomics can contribute to the development of novel enzymes that convert biomass to biofuel, which results in higher crop and fuel production, and lower cost to the consumer. This knowledge should allow better methods of control over the microbes that are used in the production of biofuels. Genomics could also improve the methods used to monitor the impact of pollutants on ecosystems and help clean up environmental contaminants. In addition, genomics has allowed for the development of agrochemicals and pharmaceuticals that could benefit medical science and agriculture.

It sounds great to have all the knowledge we can get from whole-genome sequencing; however, humans have a responsibility to use this knowledge wisely. Otherwise, it could be easy to misuse the power of such knowledge, leading to discrimination based on a person’s genetics, human genetic engineering, and other ethical concerns. This information could also lead to legal issues regarding health and privacy.