Gene expression and transcription

Last updated: September 18, 2025

Summary

The genome contains the hereditary information of the structure and function of a cell or organism. This information is stored as a sequence of bases in DNA. A relatively small percentage of DNA codes for proteins and ribonucleic acids (RNAs), while a large amount of the genome is composed of sequences without a clear function. The conversion of the information stored within DNA into a functional molecule, or RNA and proteins, is termed gene expression. Gene expression occurs in two stages: transcription and translation. During transcription, DNA is copied into RNA. RNA is then used to synthesize proteins during translation.

Key enzymes involved in transcription are DNA-dependent RNA polymerases. These enzymes synthesize the RNA molecule based on the genes encoded in DNA, which contain starting sites (promoters) where transcription begins. Transcription factors are required to recognize the promoter. RNA polymerase moves along the template strand of the double-stranded DNA. The strand is synthesized until the end of the DNA segment (termination site) is reached. In eukaryotes, the newly formed primary transcript is further modified to be, for example, available for protein synthesis.

Gene expression is strongly regulated at all levels. Some genes are expressed in all cells and are required as housekeeping genes for basic cellular functions (i.e., constitutive expression). Other genes are only active in certain cells; their expression is regulated by a variety of mechanisms. Genes can undergo activation or silencing, and transcription depends on the presence of specific DNA-binding proteins. The newly formed RNA may also be degraded after transcription by various mechanisms before use in protein synthesis. There are also regulatory mechanisms at a translational level. Although each cell in an organism contains the same DNA, the regulated expression of certain genes causes the cells to specialize and assume different functions, e.g., muscle cells or hepatocytes.

Overview

Gene expression: conversion of genetic information stored in DNA into a functional gene product (RNA and proteins)
Protein synthesis: process of gene expression (comprised of transcription and translation) as well as post-transcriptional modifications (see the article on translation and protein synthesis for more information)
Central dogma of molecular biology: genetic information always flows in one direction from DNA to RNA to the protein
- DNA → (transcription) → RNA → (translation) → protein
- Exception: retroviruses, which are able to produce DNA from RNA using their own enzyme reverse transcriptase (reverse transcription)

Gene expression in prokaryotes and eukaryotes Mature mRNA

In protein synthesis, DNA is initially transcribed into mRNA (transcription) and mRNA is translated into an amino acid chain (translation).

Transcription

In transcription, DNA serves as a template to produce a complementary RNA molecule. Only a single-strand from the double-stranded DNA (dsDNA) is read.

DNA segments
- Sense strand: the DNA segment in the double-strand DNA that is complementary to the antisense strand and has an almost identical base sequence to the mRNA that is transcribed along the antisense strand ; The sense strand is not involved in the transcription process.
- Antisense strand: the DNA segment in the double-strand DNA that is used as a template for transcription to produce the complementary mRNA strand
Promoter
- Specific DNA sequence located upstream (= in the 5′ region) of a gene that regulates transcription
- Contains AT-rich sequences (e.g., TATA box and CAAT box)
- Binding site for RNA polymerase II and several other transcription factors at the start of transcription
- Mutations at the site of promoters usually lead to severely decreased transcription rate.
- Promotors increase the replication activity at the origin of replication (ori) ^[1]
Exon-intron structure: eukaryotic genes are composed of alternating coding and noncoding regions
- Introns: contain only noncoding DNA sequences, but are essential in the regulation of gene expression
- Exons: contain protein-coding DNA sequences
Substrates: the nucleoside triphosphates ATP, GTP, CTP, and UTP
Enzymes: RNA polymerases
General transcription factors: specific helper proteins that help RNA polymerase find and bind to the promoter and initiate RNA synthesis

“Introns are intervening introverts”: Introns are found between (lat. “inter”) protein-coding DNA sequences and stay in the nucleus.

“Exons are expressive extroverts”: Exons contain protein-coding DNA sequences that will be expressed and exit the nucleus.

Upstream and downstream nucleotides Functional organisation eukaryotic genes

Section 3 Transcription

RNA polymerases and transcription factors

RNA polymerases

Transcription reactions are catalyzed by (DNA-dependent) RNA polymerases. In eukaryotic cells, there are various types of RNA polymerase, which recognize different promoter types and transcribe different types of genes. In prokaryotes, on the other hand, there is only one type of RNA polymerase that transcribes all three types of RNA.

Structure: composed of two large subunits with many polypeptide chains
Function: synthesis of a new RNA strand from 5′ to 3′ direction; reading of the DNA strand from 3′ to 5′ direction
- Unwinds DNA without help of another enzyme (intrinsic helicase activity)
- Initiates transcription (RNA polymerase II opens DNA in the promoter region)
- Has intrinsic proofreading function

Overview of RNA polymerases
Type of RNA polymerase	Transcripts	Location
RNA polymerase I (most common type)	rRNA (5.8S, 18S, and 28S rRNA)	Nucleus	Nucleolus
RNA polymerase II	hnRNA MicroRNA (miRNA) Small nuclear RNA (snRNA) mRNA Small nucleolar RNA (snoRNA)		Euchromatin region of the nucleus
RNA polymerase III	tRNA rRNA (5S rRNA) snRNA snoRNA		Nucleolus Euchromatin region of the nucleus
Mitochondrial RNA polymerase	Mitochondrial RNA	Mitochondrion

RNA polymerase II transcribes almost all genes that code for proteins.

The RNA polymerases are numbered in the order in which their products are utilized in the process of protein synthesis! I, II, and III → rRNA, mRNA, and tRNA, respectively.

In prokaryotes, there is only one type of RNA polymerase that transcribes all three types of RNA.

Transcription factors

RNA polymerases require helper proteins for promoter recognition of the genes to be transcribed.

General transcription factors: enable binding of RNA polymerase to the proximal promoter regions by binding of chromosomal DNA to specific base sequences → start of transcription
Specific transcription factors
- Modulate transcription by binding to regulatory elements (enhancers, silencers)
- Example: steroid hormone receptors

DNA-binding proteins

Proteins, such as transcription factors that bind to DNA, require specific protein domains, also termed structural motifs. These structural motifs usually use either an α-helix or a β sheet to bind to the major groove of DNA. Transcription factors have DNA-binding domains through which they are able to interact with specific DNA segments to perform their function. Numerous structural motifs of DNA-binding domains have been identified. Important examples are the zinc finger domains, leucine zippers, basic helix-loop-helix, and the homeobox.

Zinc finger
- Characteristics: zinc ion coordinated by two histidine and two cysteine residues
- DNA binding: Several zinc finger domains are often connected as a chain and bind to an α-helix in the major groove of DNA.
Leucine zipper
- Characteristics
  - Two long α-helices that bind to one another through their hydrophobic regions and form a supercoil
  - Because every seventh amino acid residue is leucine and the residues intertwine like a zipper, this structural motif is termed leucine zipper.
- DNA binding: The DNA-binding hydrophilic regions of α-helices contain many basic residues that interact with the major groove of DNA.
Basic helix-loop-helix
- Characteristics
  - Two polypeptide chains comprising a short and a long α-helix connected by a flexible loop (does not have a secondary structure).
  - The two polypeptide chains dimerize via the basic regions of the two α-helices.
- DNA binding: The short basic α-helix interacts with DNA.
Homeobox (with helix-turn-helix)
- Characteristics: a polypeptide chain with three short, successive α-helices, with the third α-helix perpendicular to the first two α-helices through a turn
- DNA binding: The third, relatively basic α-helix binds as a recognition helix, especially to exposed bases in the major groove of DNA.

Zinc finger motif Leucine zipper Basic helix-loop-helix Homeobox

An important structural motif of DNA-binding proteins is an α-helix with many basic amino acid residues.

Stages of transcription

Transcription is divided into three phases: initiation, elongation, termination.

Initiation (transcription): the start of transcription by the formation of the initiation complex and unwinding of DNA
1. Preinitiation complex (RNA polymerase-promoter closed complex) formation by binding of general transcription factors and RNA polymerase to the promoter region (e.g., TATA box, CAAT box, GC box)
2. Formation of a transcription bubble by unwinding the DNA double helix to a single strand with a length of 10–12 bases (open complex)
3. Start of RNA synthesis
Elongation
- Extension of the RNA strand
- 3′OH group of the growing RNA strand is attached to the α-phosphate group of the next complementary nucleoside triphosphate
Termination: During termination, polyadenylation starts.

Transcription of RNA Transcription bubble DNA supercoils DNA-RNA base pairs during transcription

During transcription, base pairing occurs between DNA and RNA. Uracil (instead of thymine) in RNA pairs with adenine in DNA.

RNA and DNA pair in an antiparallel direction. The 5′ end of one strand is the 3′ end of the other strand and vice versa. In both cases, the base sequences are written in the usual 5′ → 3′ direction.

Post-transcriptional modification (RNA processing)

In eukaryotes, the end-product of transcription is heterogeneous nuclear RNA (hnRNA), which is then transformed into mature mRNA through posttranscriptional modifications in the nucleus. These modifications include capping, polyadenylation, splicing, and RNA editing. mRNA then leaves the nucleus and enters the cytosol.

Capping

Definition: addition of a cap of 7-methylguanosine to the 5′ end of hnRNA to form the five-prime cap
Process
1. Cleavage of the 5′-phosphate group by RNA triphosphatase
2. Addition of a GMP residue (formed from GTP with cleavage of pyrophosphate) to the 5′ diphosphate end of hnRNA by guanylyltransferase
3. Methylation of one, two, or three ribosome residues of hnRNA with S-adenosylmethionine (SAM) as a methyl group donor
Function
- Protects against degradation (through exonucleases )
- Initiation of translation

Polyadenylation

Definition: addition of a tail of ∼200 adenosine monophosphates (polyadenylate, A) to the 3′ end of hnRNA
Process
1. Polyadenylation signal on hnRNA: AAUAAA
2. Poly(A) polymerase
  - Binds to the cleavage site and adds an ATP-dependent adenosine monophosphate of ∼ 50–250 nucleotides
  - Does not need a template for polyadenylation
Function
- ↑ Stability (protects against early degradation)
- Initiates translation

Splicing

Overview

Definition: excision of introns from hnRNA transcripts and direct linkage of exons
Function: excision of introns so that the resulting mature mRNA only contains relevant information in the form of exons

Process

Spliceosome formation at the exon-intron border
- Complex of:
  - Various snRNAs that are bound to proteins and form snRNPs (small nuclear ribonucleoproteins)
    - Pronounced “snurps”
    - Antibodies against snRNPs can be found in SLE (anti-Smith antibodies) and mixed connective tissue disease (Anti-U1 RNP antibodies)
  - The hnRNA to be modified
  - Many other small proteins
- Involved sequence segments on the hnRNA:
  - Exon-intron borders : characterized by specific base sequences (consensus sequences) on the RNA
    - 5′ splice site
    - 3′ splice site
  - Branch point: adenine nucleotide located in the intron, on which a lariat structure is formed (see below)
  - Pyrimidine-rich sequence in front of the 3′ splice site
- Mutations in the intronic splice site of the β-globin locus result in improper splicing, which leads to expression of abnormal β-globin in beta-thalassemia.
- Defective snRNP assembly can lead to congenital conditions such as spinal muscular atrophy, in which assembly is impaired due to decreased SMN protein.
Opening of the exon-intron border at the 5′ splice site: A temporary lariat structure with a 2′ → 5′ phosphodiester bond is formed, which links the two ends to be joined together in proximity (loop formation)
Opening of the exon-intron border at the 3′ splice site
Joining of the exon ends

Exon-intron boundaries Splicing Mature mRNA

The exons of a gene are the coding segments; the introns are removed from hnRNA by splicing.

RNA editing

Definition: alteration of RNA base sequences by the insertion, deletion, or modification of individual bases (independent of splicing)
Function: possibility of producing various proteins
Examples
- A-to-I editing: adenosine is deaminated to inosine, i.e., the base adenine is converted to hypoxanthine
  - Occurs in hnRNA
  - Enzyme: adenosine deaminases acting on RNA (ADARs)
    - Example: A-to-I editing of various subunit types of the glutamate receptor can alter their characteristics, which influences the effect of glutamate in the CNS.
- C-to-U editing: Cytidine is deaminated to uridine, i.e., the base cytosine is converted to uracil
  - Occurs in mRNA
  - Typical example of C-to-U editing
    - The mRNA for apolipoprotein B (apoB) codes for apoB-100.
    - After editing, the mRNA for apoB codes for a markedly smaller protein, apoB-48, because the deamination of cytidine to uridine generates a stop codon through cytidine deaminase.
    - Via C-to-U-editing, e.g., apoB-48 is formed by enterocytes compared to apoB-100 by hepatocytes.

Adenine Hypoxanthine Cytosine Uracil RNA editing

Alternative splicing

Definition: removal of introns within hnRNA with differential joining of exons
Process: similar to splicing with additional splicing factors that determine the range of splice locations
Function
- Various proteins can be produced from a single hnRNA sequence, which allows for increased information density of DNA
- The formation of new proteins is facilitated: more rapid adaptation to altered living conditions
Examples
- Different types of tropomyosin (muscle)
- Dopamine receptors (brain)
- Immunoglobulins (secreted versus membraneous)

Alternative splicing

The one gene-one enzyme hypothesis does not apply to eukaryotes. A variety of proteins can be formed from one gene by alternative splicing.

Quality control of mRNA

Location: cytoplasmic processing bodies (P-bodies)
- Contain exonucleases, decapping enzymes, and microRNAs
- Function
  - Degradation of mRNA
  - Storage for future translation

Regulation of transcription

Because transcription and protein synthesis require large amounts of energy, gene expression is strongly regulated. While some genes are continuously transcribed, other genes undergo regulation.

Prokaryotic gene regulation (operon model, Jacob-Monod Model)

Regulation of gene expression was initially analyzed in E. coli. Regulatory sequences in the bacterial genome ensure gene expression of the enzyme β-galactosidase if the sugar lactose is available as an energy source. Other proteins are also synthesized, which are associated with lactose metabolism. Therefore, it involves the coordinated expression of several genes.

Definition: a model for describing the gene regulatory mechanism in prokaryotes
- An operon is a transcriptional unit of DNA found in prokaryotes and is composed of regulatory elements and several genes that code for a protein.
- A polycistronic mRNA is formed.
  - The genes in the operon are transcribed to a single mRNA.
  - All proteins of the operon code for the mRNA.
Function: adapt to changing environmental conditions by simultaneously increasing the expression of certain related genes
Example: lac operon
- Description: A transcriptional unit of genes for enzymes involved in lactose metabolism that is only expressed in the presence of lactose (e.g. β-galactosidase). The lac operon represents a classic example of how the environment creates a genetic response.
- Components (in their order in the genome)
  - Regulatory gene lacI: does not directly belong to the lac operon but codes for a repressor protein that binds to the lac operator in the absence of lactose and prevents transcription
  - Promoter: binding site for catabolite activator protein (CAP) and RNA polymerase in transcription
  - Operator: binding site of the repressor that overlaps with the promoter
  - lacZ: β-galactosidase gene
  - lacY: permease gene
  - lacA: transacetylase gene
- Regulation
  - Presence of glucose and absence of lactose → transcription cannot take place → the lac repressor binds to the operator → polymerase cannot bind promoter → very few β-galactosidase molecules in the cell
  - Absence of glucose and presence of lactose → ↑ transcription
    - Low glucose → ↑ activity of adenylate cyclase → ↑ cAMP → ↑ activation of CAP (promoter)
    - Lactose binds to the lac repressor → inactivation and dissociation from the operator → promoter is free for polymerase → number of β-galactosidase molecules in the cell increases by 1000-fold
  - Presence of glucose and lactose: very low basal expression of lac genes

The lac operon Prokaryotic Gene Regulation: Lac Operon

In the lac operon, the repressor binds to the operator and prevents transcription of the operon gene in the absence of lactose.

Eukaryotic gene regulation

Regulation of gene expression is significantly complicated in eukaryotes compared to prokaryotes. One reason is due to the difference in size between the genomes of eukaryotes and prokaryotes, with eukaryotes having a significantly larger genome. Another reason is that the DNA in the eukaryotic genome in the nucleus is strongly condensed and packaged as chromatin. As a result, it is less accessible than prokaryotic DNA. However, a common feature of eukaryotes and prokaryotes is the importance of activators and repressors, which bind specific DNA sequences and increase or inhibit gene expression.

Distal regulatory elements: DNA sequences that can affect the transcription rate of a gene and can be located before, within, or after an intron of the gene they regulate
- Enhancers
  - Short DNA sequences ∼ 20 bp in length
  - Mainly a palindrome or a tandem repeat
  - When specific transcription factors (activators) bind to enhancers, the transcription rate of a gene on the same chromosome increases.
    - These transcription factors may be ligand-dependent or ligand-independent.
    - Ligand-dependent transcription factors: Intracellular hormone receptors that interact with enhancer sequences after hormone binding in the nucleus and increase the transcription rate of the genes to be controlled.
  - Examples of an enhancer: hypoxia-response element (HRE)
    - The transcription factor hypoxia-inducible factor (HIF) binds to the HRE sequence during hypoxia and induces certain target genes that are important in the response to hypoxia, e.g., expression of EPO and VEGF.
    - In normoxia (sufficient amount of oxygen), HIF is hydroxylated by HIF prolyl hydroxylase. Hydroxy-HIF is ubiquitinylated and degraded in the proteasome and is unable to increase the expression of its own target genes.
- Silencer
  - Specific DNA sequence
  - When specific transcription factors (repressors) bind to silencers, the transcription rate of a gene on the same chromosome decreases.

Principles of transcriptional regulation via distal regulatory DNA sequences

Transcriptional inhibitors

Transcriptional inhibitors are strong cytotoxins but can also be partially used as an antibiotic.

Inhibitor	Mechanism	Occurrence/use
α-amanitin	Inhibits eukaryotic RNA polymerase II Severely hepatotoxic	Toxin found in Amanita phalloides (death cap mushrooms)
Rifampicin	Inhibits prokaryotic DNA-dependent RNA polymerase	Used as an antibiotic
Actinomycin D (dactinomycin)	Inhibits prokaryotic and eukaryotic RNA polymerase	Used as a chemotherapeutic agent

Start your trial, and get 5 days of unlimited access to over 1,100 medical articles and 5,000 USMLE and NBME exam-style questions.

Start free trial

Evidence-based content, created and peer-reviewed by physicians. Read the disclaimer