Nucleotides, DNA, and RNA

Last updated: June 13, 2022

Summary

The genetic information of an organism is stored in the form of nucleic acids. Nucleic acids, DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), are long linear polymers composed of nucleotide building blocks. Each nucleotide is comprised of a sugar, a phosphate residue, and a nitrogenous bases (a purine or pyrimidine). DNA is longer than RNA and contains the entire genetic information of an organism encoded in the sequences of the bases. In contrast, RNA only contains a portion of the information and can have completely different functions in the cell.

DNA is structurally characterized by its double helix: two opposite, complementary, nucleic acids strands that spiral around one another. The DNA backbone, with alternatively linked sugar and phosphate residues, is located on the outside. The bases are located inside the helix and form the base pairs adenine and thymine or guanine and cytosine, which are linked by hydrogen bonds.

The human genome comprises 3.2 x 10⁹ base pairs, which are distributed over 23 pairs of chromosomes. Each chromosome is a linear DNA molecule of a certain length. The chromosome is only well visualized under the light microscope during the metaphase of mitosis, as it is maximally condensed during this phase. Chromosomes are present as pairs in most cells of the body. One chromosome in each of the 23 pairs originates from the mother and the other from the father.

Both interrelated chromosomes are termed homologous because they each have a variant of the same gene. Alterations in the number or structure of the chromosomes lead to various conditions, e.g., developmental disorders. Chromosomal assessment with different molecular biology and cytogenetic methods often allows for a clear diagnosis.

Nucleotides

Structure
- Nitrogenous base (a purine or pyrimidine)
- Pentose sugar
- Phosphate group
Bonds
- Nucleoside: base and sugar (ribose or deoxyribose), linked by an N-glycosidic bond
- Nucleotide: nucleoside and phosphate group, linked by a 3'-5' phosphodiester bond

General structure of nucleosides and nucleotides

NucleoSides consist of base and Sugar (deoxyribose). NucleoTides consist of base, deoxyribose and phosphaTe.

Nucleobases

Cytosine has 3 H-bond donors and forms a strong bond with guanine, which has 3 H-bond acceptors.
- More stable than bonds consisting of 2 H bonds (A-T)
- The higher the number of cytosine-guanine bonds in DNA, the higher its melting temperature.
Other than uracil, there are many other bases that may be created after the initial nucleic acid chain formation, for example:
- Hypoxanthine
  - Created from adenine via deamination during RNA editing
  - Present as inosine in tRNA and plays an important role in ensuring proper wobble base pair translation (see “Wobble hypothesis”)
- Xanthine
  - Intermediate of purine metabolism
  - Created from guanine via deamination
Amino acids required for purine synthesis
See “Purines and pyrimidines” for more details.

Overview of pyrimidines and purines
	Rings	Base	Notable characteristics	As a nucleoside unit in RNA	As a nucleoside unit in DNA
Pyrimidines	1 ring	Cytosine (C)	Forms 3 H bonds	Cytidine	Deoxycytidine
		Thymine (T)	Created from 5-methylcytosine via deamination Also arises from methylation of uracil Has a methyl group Forms 2 H bonds	Not present	Thymidine
		Uracil (U)	Created from cytosine via deamination during RNA editing Forms 2 H bonds	Uridine	Not present
Purines	2 rings	Adenine (A)	Forms 2 H bonds	Adenosine	Deoxyadenosine
Purines	2 rings	Guanine (G)	Has a ketone group Forms 3 H bonds	Guanosine	Deoxyguanosine

Overview of purine and pyrimidine bases Source of atoms in the purine ring

“A mean person GAGs a PURring cat!” (Three Amino acids, Glycine, Aspartate, and Glutamine, are necessary for PURine synthesis.)

“C-G stabilizes DNA Crazy Good!” (C-G bonds are extremely stable.)

“PYRates Capture 1 Undersea Treasure.” (PYRimidine bases: Cytosine, Thymine, and Uracil and consist of 1 ring.)
“PURe A Glass for 2.” (PURine bases are Adenine and Guanine and consist of 2 rings.)

Thymine contains a methyl group and is only found in DNA; uracil is only found in RNA.

Nucleic acid sugars

Structure: The sugar found in nucleic acids is a pentose, which has a five-atom ring.
- DNA is deoxyribose
- RNA is ribose
Pentose binds
- Bases via N-glycosidic bonds
- Phosphate residue via phosphodiester bonds

Ribose and deoxyribose

Phosphate group

A nucleotide can have one, two, or three phosphate groups (also termed “nucleoside monophosphate”, “diphosphate”, and “triphosphate”, respectively).
Nucleic acids are composed of nucleoside monophosphates.
Nucleoside diphosphates and nucleoside triphosphates (e.g., ATP) are found in biochemical processes requiring energy
- The phosphoanhydride bonds store a high amount of energy that can be utilized in biochemical processes when targeted by 3' hydroxyl attack.
- The nucleotide that is added to the 5' end of the nucleic acid initially has three phosphate groups. The splitting of the two end phosphate groups supplies the energy necessary for the phosphodiester bonds that build the DNA backbone.

ATP

Function of nucleotides and their derivates

Nucleotide and nucleotide derivatives have important functions in the body.

Building blocks of nucleic acids
Source of energy: : especially as a universal energy carrier of the cell in the form of ATP, but also GTP
Signal molecules: especially the second messenger cAMP (cyclic adenosine monophosphate) and cGMP (cyclic guanosine monophosphate) , both phosphoric esters
Activators for the transfer of groups: Through the potential of forming energy-rich bonds, nucleotides are able to transfer a molecule onto another in biosynthesis, e.g.:
- UDP-Glucose is an active form of glucose in glycogenesis.
- Dietary choline can be activated to citicoline by CTP and be used in the synthesis of phosphatidylcholine.
- 3'-Phosphoadenosine-5'-phosphosulfate (PAPS) serves as a sulfate group donor in sulfatide synthesis.
- S-Adenosyl methionine (SAM) is formed from methionine and serves as a cofactor in methylation reactions.
Regulators: enzyme reactions in signal transduction pathways (e.g., activates GTP G proteins)
Carrier molecules: e.g., the electron carrier nicotinamide adenine dinucleotide (NAD⁺) and flavin adenine dinucleotide (FAD) as a component of coenzymes in redox reactions

cAMP Cyclic guanosine monophosphate (cGMP) S-Adenosyl methionine (SAM) NAD+ / NADH redox pair FAD / FADH2 redox pair

The energy carrier ATP contains ribose and not deoxyribose as a sugar, and therefore has a 2' OH group.

Overview of nucleic acids

Nucleic acids

Long, linear chains (polymers) of nucleotides
Alternating sugar and phosphate residues of individual nucleotides, linked by phosphodiester bonds, form the backbone
Primary structure of nucleic acids: nucleotide sequence in the chain
Phosphodiester bonds are negatively charged.
- Negative charges stabilize the nucleic acids.
- Phosphodiester bonds cannot be easily hydrolyzed like other esters.
The chemical composition of nucleic acids (DNA and RNA) and their structure of repetitive nucleotide units allow them to function as both information carrier and mediator.

DNA structure

Comparison of DNA and RNA

DNA vs. RNA
	DNA	RNA
Bases	Thymine Cytosine, adenine, guanine Modification, especially to 5-Methylcytosine	Uracil Cytosine, adenine, guanine Many unusual or modified bases are possible.
Sugar	Deoxyribose	Ribose
Length	Depending on the organism Ranging from several thousand to several millions of nucleotides	Varies considerably
Structure	Double-stranded helix Base pairing Superhelix Associates with proteins (particularly histones) for dense packaging in the nucleus	Usually single-stranded (except the double-stranded miRNA and siRNA) Various 3D structures are possible; e.g., loops through the formation of short sections with base pairing (double-stranded)
Function	Carries the hereditary information (collectively known as the genome) for the construction and function of the organism	Varies considerably depending on class, e.g., coding, regulatory, or enzymatic function (see table “Classification of RNA” below)

DNA structure and the human genome

Overview of double-stranded DNA

Organization of the human genome

Double-stranded chain of deoxyribonucleotides in cells
Both strands are complementary to each other and run anti-parallel.
- Nucleotides form single-stranded DNA that stabilizes into double-stranded DNA
- DNA conforms into right-handed double helix that binds histone octamers to form nucleosomes (appear as “beads on a string” under electron microscopy)
- Chromatin formation begins, which is then further compacted
- During replication (mitosis or meiosis), chromatin maximally condenses into chromosomes (only visible during metaphase under light microscopy)

Double helix

3D structure of DNA in which two polynucleotide strands are intertwined, stabilized by:
- Specific base pairing via hydrogen bonds (H bonds) between complementary nucleobases of DNA
  - A-T bonds consist of 2 H bonds
  - G-C bonds consist of 3 H bonds, resulting in a stronger bond (an ↑ in G and C in DNA → ↑ melting temperature of DNA)
- Hydrophobic effect: The negatively charged sugar-phosphate backbone is located on the outside of the helix, the bases on the inside.
- Base stacking: The base pairs are stacked on one another (stacking interactions) and interact through van der Waals forces, which have an additional stabilizing effect.
Double helix has a minor groove and a major groove.

Conformations

B conformation (B-DNA)
- Most prevalent
- Right-handed double helix
- 10 base pairs per helical twist to a length of 3.4 nm
- Diameter of the helix: 2 nm
- Bases are approx. perpendicular to the helix axis.
A conformation (A-DNA)
- Right-handed double helix, although broader and shorter than B-DNA
- Base pairs are not perpendicular to the helix axis but are slightly inclined toward the axis.
- Dehydrated form, i.e., present under experimental conditions and not in vivo.
Z conformation (Z-DNA)
- Left-handed double helix
- Stretched longer than B-DNA resulting in a smaller diameter
- Occurs in GC-rich sequences, although they are generally rare under physiological conditions
- The phosphate groups of the DNA backbone form a zigzag pattern.

Watson-Crick base pairing DNA structure

Base pairs in DNA: guanine pairs with cytosine (3 H bonds), adenine pairs with thymine (2 H bonds).

Supercoils

Description: winded double helix , also termed “superhelix”
Occurrence: especially in circular DNA molecules
- In prokaryotes: chromosome of bacteria, plasmids
- In eukaryotes
  - Mitochondrial DNA (circular)
  - “Inflexible” segment of linear, chromosomal DNA
Function: Supercoiled DNA molecules have a more compact structure than the relaxed form of DNA.

Palindrome

Description
- A palindrome is a sequence that reads the same forwards and backward (e.g., eye, level, madam).
- The molecular biological use of the term “palindrome” is for inverted repeats (repeated sequence in the opposite direction).
Occurrence
- In palindromic sequences, a sequence of base pairs occurring over a certain segment is read identically on both complementary DNA strands, i.e., the sequence always reads the same on both strands in a 5'→3' direction.
- Bases may be present between the palindromic sequences that are not complementary.
- These segments are self-complementary and can form a hairpin loop.
- Results in the formation of a cross-shaped structure in double-stranded DNA
Function: Some proteins that are capable of binding DNA require palindromic sequences as a recognition sequence, e.g., steroid hormone receptors or restriction enzymes.

Chromatin

Definition: complex of DNA and its associated proteins (both histones and non-histones) structured as repetitive units (nucleosomes)
Functions
- Condensation and organization of DNA (a very large molecule) allow for storage inside the nucleus and are important for gene regulation
- Chromatin remodeling
  - Opening chromatin structure from a compact state to a more accessible arrangement
  - Allows for transcription factors and RNA polymerase to access specific loci of genes
  - Facilitated by various enzyme remodelers (e.g., SWI/SNF ATPases), histone post-translational modifications (see below), and direct modification of DNA itself (e.g., DNA demethylation).
Types
- Heterochromatin
  - Contains inactive DNA because the highly condensed, steric conformation does not allow transcription
  - Darker on electron microscopy (EM)
  - DNA is highly methylated and deacetylated
- Euchromatin
  - Contains active DNA because the less condensed steric conformation makes DNA accessible for transcription
  - Lighter on EM

Heterochromatin Human genome packaging

Heterochromatin is Hooked tight while Euchromatin is Easygoing.

Histones

Definition: group of proteins that bind to DNA in the nucleus of eukaryotes to support the structure of chromatin
Characteristics
- Positively charged through the high percentage (∼ 25%) of basic amino acids (arginine and lysine)
- Strong ionic interactions with negatively charged DNA (through phosphate groups on DNA)
- Synthesized during S phase in the cytosol and transported to the nucleus
Types: There are four core histones and a linker.
- 4 Core histones: H2A, H2B, H3, H4
  - 2 molecules of each core histone form the nucleosome 8-protein complex core, a histone octamer, around which the DNA is wound in segments
  - Controls gene expression via reversible post-translational modification of histones (acetylation, methylation, phosphorylation, ubiquitinylation, Sumoylation, ADP-ribosylation)
    - Histone methylation
      - Occurs via histone methyltransferase, which targets lysine or arginine residues
      - Methylation usually suppresses transcription by enabling tighter DNA coiling.
      - Depending on the location and number of methyl groups that are attached, methylation can also promote transcription.
    - Histone acetylation
      - Acetylation of specific lysine residues (positively charged) in histone proteins → less positively charged histones → weaker binding of DNA → relaxation of DNA coiling → ↑ transcription activity
      - Similarly, histone deacetylation via histone deacetylase tightens the coiling of DNA and decreases transcription activity (see “Histone modification”).
      - Clinical implications: pathogenesis of Huntington disease (dysregulated acetylation, e.g., histone deacetylation altering gene expression); thyroid hormone-induced acetylation that influences thyroid hormone synthesis
- Linker histone (H1)
  - Structure: not completely known, but a different, less uniform structure than core histones
  - Function: binds to linker DNA and to the nucleosome, leading to stabilization of the chromatin fiber.

Histone Methylation Mutes transcription. Histone Acetylation Activates transcription.

Nucleosome (nucleosome core particle)

Definition: a structural and functional complex of DNA (∼ 150 bp) and histone octamer that gives chromatin its “beads on a string” appearance
Structure
- DNA wraps around the nucleosome core with ∼ 1.8 twists
- Nucleosomes are linked to one another through linker DNA (short DNA segment of variable lengths)
30 nm chromatin fiber (solenoid)
- Nucleosome strand that is spirally bound to fibers with a diameter of 30 nm
- Each twist of the 30 nm fiber contains ∼ 6 nucleosomes.
Chromatin loop
- Condensed form of DNA beyond the nucleosome and 30 nm fibers
- The histone H1 and nonhistones are involved in the formation of loops.

Chromosomes

See “Basics of human genetics” for more information.

Description
- A denser packaging of chromatin that only becomes visible under the microscope during cell division (especially in metaphase)
- Number of chromosomes in the human genome:
  - Somatic cells contain 23 pairs of homologous duplicated chromosome pairs (46 chromosomes in total).
  - Germ cells only contain 23 single-stranded, unduplicated chromosomes.
Structure: A chromosome pair consists of 2 identical chromatids connected at the center by a centromere.

Human genome

The human genome consists of ∼ 3.2 billion base pairs (bp).
The DNA stored in a human cell would total ∼ 1.8 m in length.
In addition to the nuclear genome (found in the nucleus), there is also a mitochondrial genome that largely codes for RNA-associated proteins

Nuclear genome

∼ 10% contains genes and related sequences
- ∼ 3% of the nuclear genome codes for proteins (including introns) and RNAs.
- ∼ 1% of the nuclear genome only codes for exons.
∼ 90% does not contain genes
- The function of ∼ 50% of DNA sequences is unknown.
- ∼ 45% is composed of repetitive sequences (repetitive genetic elements).
  - Simple repetitive DNA elements (tandem repeats)
    - Satellite DNA: repetitive sequences of up to 18,000 nucleotides
    - Minisatellite DNA: repetitive sequences from 3–100 nucleotides
    - Microsatellite DNA: repetitive sequences from 2–6 nucleotides
  - Previously mobile genetic elements (such as transposons, LTR , non-LTR, LINE , SINE )
- ∼ 24% of the genome is spanned by introns.

Mitochondrial genome (mitochondrial DNA, mtDNA)

Circular genome of ∼ 16,500 bp
Mitochondrial DNA does not utilize histones for DNA packaging.
Over 90% of mtDNA codes for structural genes, including for mRNA, tRNA, and rRNA.
- 13 genes that code for proteins, i.e., for mRNAs
- 22 genes for tRNAs
- 2 genes for rRNA

RNA: Structure and characteristics

RNA classes and their structure

RNAs can be differentiated into various types, which differ in their length, structure, and function. Depending on the type, RNA can be a single-stranded or double-stranded segment.

Classification of RNA
	Function	Structure
mRNA (messenger RNA)	Coding RNA that functions as a template for translation in protein synthesis DNA is used as a template for mRNA synthesis in the nucleus by RNA polymerase (transcription) See “Gene expression and transcription.”	Very variable structure and length, because the nucleotide sequence of mRNA depends on the nucleotide sequence of the corresponding DNA segment In eukaryotes, the initial transcript from DNA is known as heterogeneous nuclear RNA (hnRNA). pre-mRNA: hnRNA that undergoes posttranscriptional modifications to become mRNA
tRNA (transfer RNA)	Adapter molecule in protein synthesis Transports amino acids to the ribosome See “Gene expression and transcription.”	Secondary structure Composed of 75–90 nucleotides Contain a large percentage of chemically modified bases Form a characteristic cloverleaf structure through intramolecular base pairing Anticodon loop Contains a three-base binding site (anticodon) for recognizing complementary mRNA sequences (codon) Located opposite of the acceptor stem Acceptor stem Found in prokaryotes and eukaryotes 3'OH end with the sequence 5'-CCA-3' Binding site where amino acids get covalently bound: formation of charged tRNA that matches the anticodon D-arm (or dihydrouridine loop): contains dihydrouridine molecules essential for correct tRNA recognition by the aminoacyl-tRNA synthetase T-arm (or TΨC loop): contains modified bases (ribothymidine, pseudouridine, cytidine) necessary for tRNA ribosome binding
rRNA (ribosomal RNA)	Fulfills structural and functional tasks (catalyst) as part of the ribosome during protein synthesis	5S, 5.8S, 18S, and 28S rRNA 18S rRNA: component of the small subunit of ribosomes (40S) 5S, 5.8S, and 28SrRNA Components of the large subunit of ribosomes (60S) 28S rRNA catalyzes the formation of peptide bonds in the ribosome (often referred to as a ribozyme)
snRNA (small nuclear RNA)	Class of noncoding RNAs in the nucleus Component of the spliceosome Involved in the splicing of pre-mRNA	Composed of several hundred nucleotides
snoRNA (small nucleolar RNA)	Class of noncoding RNAs in the nucleolus Modifying RNA molecules, especially rRNAs including through methylation of ribose residues	Composed of 100–170 nucleotides
RNA component of signal recognition particles (scRNA; small cytoplasmic RNA)	7S RNA, in addition to the six protein components of the signal recognition particles (SRP), which is responsible for the transport of newly formed proteins in the ribosome to intracellular compartments in the cytoplasm	Composed of 300 nucleotides Complex structure with many double-helical segments
Telomerase RNA component (human telomerase RNA, hTR)	Nucleic acid component of telomerase Serves as an RNA matrix by which the telomerase extends the free ends of the genomic DNA during DNA replication to prevent loss of coding DNA segments. Therefore, telomerase is a reverse transcriptase and brings its own matrix.	In humans, the matrix sequence is 5'-UAACCCUA-3' Composed of 451 nucleotides and does not have a poly(A) tail
miRNA (microRNA)	Class of regulatory, noncoding RNAs, naturally found in cells in the form of hairpin structures Encoded in introns Regulates gene expression Binding of the 3' untranslated region via nucleotide pairing prevents translation and accelerates the degradation of certain mRNA. miRNA binds loosely to mRNA, thereby allowing a higher number of related mRNAs to bind it. Dysfunctional miRNA expression may contribute to the development of some cancers (e.g., a miRNA that silences the mRNA of a tumor suppressor gene).	Composed of ∼ 20–30 nucleotides Formed from precursor molecules with a 5' cap and a poly(A) tail, but are then cleaved into smaller oligonucleotides
siRNA (small interfering RNA)	Class of regulatory, noncoding RNAs that most commonly arise from exogenous dsRNA sources (e.g., viruses) Regulates gene expression via highly specific nucleotide pairing → ↑ mRNA degradation → ↓ mRNA translation Experimental use: gene “knockdown”	Composed of ∼ 20–30 nucleotides Formed from double-stranded precursor molecules from a similar mechanism as for miRNA

Structure of tRNA (cloverleaf model) Ribosome assembly

To remember the features of the two tRNA arms, think: “Dihydrouridine and Detection” for the D-arm and “tRNA Tethering” for the T-arm.

“CCA Can Catch Amino acids” (function of the 5'-CCA-3' sequence in tRNA).

Start your trial, and get 5 days of unlimited access to over 1,100 medical articles and 5,000 USMLE and NBME exam-style questions.

Start free trial

Evidence-based content, created and peer-reviewed by physicians. Read the disclaimer