DNA Structure and Sequencing

The Structure and Sequence of DNA

DNA is a double helix of two anti-parallel, complementary strands having a phosphate-sugar backbone with nitrogenous bases stacked inside.

Learning Objectives

Describe the structure of DNA and summarize the importance of DNA sequencing

Key Takeaways

Key Points

  • The two DNA strands are anti-parallel in nature; that is, the 3′ end of one strand faces the 5′ end of the other strand.
  • The nucleotides that comprise DNA contain a nitrogenous base, a deoxyribose sugar, and a phosphate group which covalently link with other nucleotides to form phosphodiester bonds.
  • Nucleotide bases can be classified as purines (containing a double-ring structure) or pyrimidines (containing a single-ring structure).
  • Adenine (purine) and thymine (pyrimidine) are complementary base pairs as are guanine (purine) and cytosine (pyrimidine).
  • DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule.

Key Terms

  • deoxyribose: a derivative of the pentose sugar ribose in which the 2′ hydroxyl (-OH) is reduced to a hydrogen (H); a constituent of the nucleotides that comprise deoxyribonucleic acid, or DNA
  • hydrogen bond: A weak bond in which a hydrogen atom already covalently bonded to a oxygen or nitrogen atom in one molecule is attracted to an electronegative atom (usually nitrogen or oxygen) in the same or different molecule.
  • nucleotide: the monomer comprising DNA or RNA molecules; consists of a nitrogenous heterocyclic base that can be a purine or pyrimidine, a five-carbon pentose sugar, and a phosphate group

The monomeric building blocks of DNA are deoxyribomononucleotides (usually referred to as just nucleotides), and DNA is formed from linear chains, or polymers, of these nucleotides. The components of the nucleotide used in DNA synthesis are a nitrogenous base, a deoxyribose, and a phosphate group. The nucleotide is named depending on which nitrogenous base is present. The nitrogenous base can be a purine such as adenine (A) and guanine (G), characterized by double-ring structures, or a pyrimidine such as cytosine (C) and thymine (T), characterized by single-ring structures. In polynucleotides (the linear polymers of nucleotides) the nucleotides are connected to each other by covalent bonds known as phosphodiester bonds or phosphodiester linkages.

image

Nucleotide Structure: Each nucleotide is made up of a sugar, a phosphate group, and a nitrogenous base. The sugar is deoxyribose in DNA and ribose in RNA. In their mononucleotide form, nucleotides can have one, two, or three phosphates attached to them. When linked together in polynucleotide chains, the nucleotides always have just one phosphate. A molecule with just a nitrogenous base and a sugar is known as a nucleoside. Once at least one phosphate is covalently attached, it is known as a nucleotide.

James Watson and Francis Crick, with some help from Rosalind Franklin and Maurice Wilkins, are credited with figuring out the structure of DNA. Watson and Crick proposed that DNA is made up of two polynucleotide strands that are twisted around each other to form a right-handed helix.

The two polynucleotide strands are anti-parallel in nature. That is, they run in opposite directions.

The sugars and phosphates of the nucleotides form the backbone of the structure, whereas the pairs of nitrogenous bases are pointed towards the interior of the molecule.

The twisting of the two strands around each other results in the formation of uniformly-spaced major and minor grooves bordered by the sugar-phosphate backbones of the two strands.

image

Three representations of DNA’s double helical structure.: A is a spacefill model of DNA, where every atom is represented as a sphere. The two anti-parallel polynucleotide strands are colored differently to illustrate how they coil around each other. B is a cartoon model of DNA, where the sugar-phosphate backbones are represented as violet strands and the nitrogenous bases are represented as color-coded rings. C is another spacefill model, with the sugar-phosphate atoms colored violet and all nitrogenous base atoms colored green. The major and minor grooves, which wrap around the entire molecule, are apparent as the spaces between the sugar-phosphate backbones.

The diameter of the DNA double helix is 2 nm and is uniform throughout. Only the pairing between a purine and pyrimidine can explain the uniform diameter. That is to say, at each point along the DNA molecule, the two sugar phosphate backbones are always separated by three rings, two from a purine and one from a pyrimidine.

The two strands are held together by base pairing between nitrogenous bases of one strand and nitrogenous bases from the other strand. Base pairing takes place between a purine and pyrimidine stabilized by hydrogen bonds: A pairs with T via two hydrogen bonds and G pairs with C via three hydrogen bonds.

The interior basepairs rotate with respect to one another, but are also stacked on top of each other when the molecule is viewed looking up or down its long axis.

Each base pair is separated from the previous base pair by a height of 0.34 nm and each 360o turn of the helix travels 3.4 nm along the long axis of the molecule. Therefore, ten base pairs are present per turn of the helix.

image

DNA Structure: DNA has (a) a double helix structure and (b) phosphodiester bonds. The (c) major and minor grooves are binding sites for DNA binding proteins during processes such as transcription (the copying of RNA from DNA) and replication.

DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. Rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as diagnostics, biotechnology, forensic biology, and biological systematics. The rapid speed of sequencing attained with modern technology has been instrumental in obtaining complete DNA sequences, or genomes, of numerous types and species of life, including the human genome and those of other animal, plant, and microbial species.

DNA Sequencing Techniques

DNA sequencing techniques are used to determine the order of nucleotides (A,T,C,G) in a DNA molecule.

Learning Objectives

Differentiate among the techniques used to sequence DNA

Key Takeaways

Key Points

  • Genome sequencing will greatly advance our understanding of genetic biology and has vast potential for medical diagnosis and treatment.
  • DNA sequencing technologies have gone through at least three “generations”: Sanger sequencing and Gilbert sequencing were first-generation, pyrosequencing was second-generation, and Illumina sequencing is next-generation.
  • Sanger sequencing is based on the use of chain terminators, ddNTPs, that are added to growing DNA strands and terminate synthesis at different points.
  • Illumina sequencing involves running up to 500,000,000 different sequencing reactions simultaneously on a single small slide. It makes use of a modified replication reaction and uses fluorescently-tagged nucleotides.
  • Shotgun sequencing is a technique for determining the sequence of entire chromosomes and entire genomes based on producing random fragments of DNA that are then assembled by computers which order fragments by finding overlapping ends.

Key Terms

  • DNA sequencing: a technique used in molecular biology that determines the sequence of nucleotides (A, C, G, and T) in a particular region of DNA
  • dideoxynucleotide: any nucleotide formed from a deoxynucleotide by loss of an a second hydroxyl group from the deoxyribose group
  • in vitro: any biochemical process done outside of its natural biological environment, such as in a test tube, petri dish, etc. (from the Latin for “in glass”)

DNA Sequencing Techniques

While techniques to sequence proteins have been around since the 1950s, techniques to sequence DNA were not developed until the mid-1970s, when two distinct sequencing methods were developed almost simultaneously, one by Walter Gilbert’s group at Harvard University, the other by Frederick Sanger’s group at Cambridge University. However, until the 1990s, the sequencing of DNA was a relatively expensive and long process. Using radiolabeled nucleotides also compounded the problem through safety concerns. With currently-available technology and automated machines, the process is cheaper, safer, and can be completed in a matter of hours. The Sanger sequencing method was used for the human genome sequencing project, which was finished its sequencing phase in 2003, but today both it and the Gilbert method have been largely replaced by better methods.

image

Sanger Method: In Frederick Sanger’s dideoxy chain termination method, fluorescent-labeled dideoxynucleotides are used to generate DNA fragments that terminate at each nucleotide along the template strand. The DNA is separated by capillary electrophoresis on the basis of size. From the order of fragments formed, the DNA sequence can be read. The smallest fragments were terminated earliest, and they come out of the column first, so the order in which different fluorescent tags exit the column is also the sequence of the strand. The DNA sequence readout is shown on an electropherogram that is generated by a laser scanner.

Sanger Sequencing

The Sanger method is also known as the dideoxy chain termination method. This sequencing method is based on the use of chain terminators, the dideoxynucleotides (ddNTPs). The dideoxynucleotides, or ddNTPSs, differ from deoxynucleotides by the lack of a free 3′ OH group on the five-carbon sugar. If a ddNTP is added to a growing DNA strand, the chain is not extended any further because the free 3′ OH group needed to add another nucleotide is not available. By using a predetermined ratio of deoxyribonucleotides to dideoxynucleotides, it is possible to generate DNA fragments of different sizes when replicating DNA in vitro.

A Sanger sequencing reaction is just a modified in vitro DNA replication reaction. As such the following components are needed: template DNA (which will the be DNA whose sequence will be determined), DNA Polymerase to catalyze the replication reactions, a primer that basepairs prior to the portion of the DNA you want to sequence, dNTPs, and ddNTPs. The ddNTPs are what distinguish a Sanger sequencing reaction from just a replication reaction. Most of the time in a Sanger sequencing reaction, DNA Polymerase will add a proper dNTP to the growing strand it is synthesizing in vitro. But at random locations, it will instead add a ddNTP. When it does, that strand will be terminated at the ddNTP just added. If enough template DNAs are included in the reaction mix, each one will have the ddNTP inserted at a different random location, and there will be at least one DNA terminated at each different nucleotide along its length for as long as the in vitro reaction can take place (about 900 nucleotides under optimal conditions.)

The ddNTPs which terminate the strands have fluorescent labels covalently attached to them. Each of the four ddNTPs carries a different label, so each different ddNTP will fluoresce a different color.

After the reaction is over, the reaction is subject to capillary electrophoresis. All the newly synthesized fragments, each terminated at a different nucleotide and so each a different length, are separated by size. As each differently-sized fragment exits the capillary column, a laser excites the flourescent tag on its terminal nucleotide. From the color of the resulting flouresence, a computer can keep track of which nucleotide was present as the terminating nucleotide. The computer also keeps track of the order in which the terminating nucleotides appeared, which is the sequence of the DNA used in the original reaction.

Second Generation and Next-generation Sequencing

The Sanger and Gilbert methods of sequencing DNA are often called “first-generation” sequencing because they were the first to be developed. In the late 1990s, new methods, called second-generation sequencing methods, that were faster and cheaper, began to be developed. The most popular, widely-used second-generation sequencing method was one called Pyrosequencing.

Today a number of newer sequencing methods are available and others are in the process of being developed. These are often called next-generation sequencing methods. The most widely-used sequencing method currently is one called Illumina sequencing (after the name of the company which commercialized the technique), but numerous competing methods are in the developmental pipeline and may supplant Illumina sequencing.

In Illumina sequencing, up to 500,000,000 separate sequencing reactions are run simultaneously on a single slide (the size of a microscope slide) put into a single machine. Each reaction is analyzed separately and the sequences generated from all 500 million DNAs are stored in an attached computer. Each sequencing reaction is a modified replication reaction involving flourescently-tagged nucleotides, but no chain-terminating dideoxy nucleotides are needed.

When the human genome was first sequenced using Sanger sequencing, it took several years, hundreds of labs working together, and a cost of around $100 million to sequence it to almost completion. Next generation sequencing can sequence a comparably-sized genome in a matter of days, using a single machine, at a cost of under $10,000. Many researchers have set a goal of improving sequencing methods even more until a single human genome can be sequenced for under $1000.

Shotgun Sequencing

Sanger sequence can only produce several hundred nucleotides of sequence per reaction. Most next-generation sequencing techniques generate even smaller blocks of sequence. Genomes are made up of chromosomes which are tens to hundreds of millions of basepairs long. They can only be sequenced in tiny fragments and the tiny fragments have to put in the correct order to generate the uninterrupted genome sequence. Most genomic sequencing projects today make use of an approach called whole genome shotgun sequencing.

Whole genome shotgun sequencing involves isolating many copies of the chromosomal DNA of interest. The chromosomes are all fragmented into sizes small enough to be sequenced (a few hundred basepairs) at random locations. As a result, each copy of the same chromosome is fragmented at different locations and the fragments from the same part of the chromosome will overlap each other. Each fragment is sequenced and sophisticated computer algorithms compare all the different fragments to find which overlaps with which. By lining up the overlapped regions, a process called tiling, the computer can find the largest possible continuous sequences that can be generated from the fragments. Ultimately, the sequence of entire chromosomes are assembled.

image

Whole genome shotgun sequencing.: In shotgun sequencing, multiple copies of the same chromosome are isolated and then fragmented in random locations. The different copies of the chromosome end up generating different length fragments. When the complete collection of fragments has been sequenced, comparing the sequences of all the fragments will reveal which fragments have ends that overlap with other fragments. The complete sequence from one end of the original DNA to the other can be assembled by following the sequence from the first overlapping fragment to the last.

Genome sequencing will greatly advance our understanding of genetic biology. It has vast potential for medical diagnosis and treatment.