A codon is a sequence of three DNA (or RNA) bases that codes for a specific amino acid or acts as a stop signal. There are 64 possible codons (4^3 = 64) mapping to 20 amino acids and 3 stop codons. This redundancy means multiple codons can code for the same amino acid.

How does text-to-DNA encoding work in this tool?

This tool uses a custom mapping where each ASCII character is encoded as a pair of DNA codons (6 bases). The four DNA bases A, C, G, T are mapped to 2-bit values (A=00, C=01, G=10, T=11), allowing 6 bases to encode one byte (8 bits). This is not a biological standard but a steganographic encoding scheme.

What are the four DNA bases?

The four DNA bases are Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). In RNA, Uracil (U) replaces Thymine. Bases pair in complementary pairs: A pairs with T (or U in RNA), and C pairs with G.

What is the standard genetic codon table?

The standard genetic code maps 64 codons to 20 amino acids and 3 stop codons (TAA, TAG, TGA). For example, ATG codes for Methionine (the start codon), TTT and TTC code for Phenylalanine, and TAA is a stop codon. The code is nearly universal across all known life on Earth.

Can DNA actually store digital data?

Yes. DNA data storage is an active research field. Scientists at Microsoft, Harvard, and other institutions have demonstrated storing megabytes of digital data in synthetic DNA strands. DNA offers extraordinary density — theoretically 1 exabyte (10^18 bytes) per cubic centimeter — and can last thousands of years in cold, dry conditions.

DNA Sequence Encoder

Encode text to DNA codon sequences (A/C/T/G), decode DNA back to text, or translate codons to amino acids.

Plain Text

DNA Sequence (A/C/G/T)

Enter text to encode as a DNA sequence.

How to Use the DNA Encoder

Text→DNA — type any text. Each character is encoded as a sequence of A/C/G/T bases using a 2-bit per base scheme (6 bases per character).
DNA→Text — paste a DNA sequence to decode it back to the original text.
Amino Acids — input any DNA/RNA sequence and translate it using the standard genetic codon table (triplet codons → amino acid single-letter codes).
Codon Table — full reference of all 64 codons and their amino acid mappings.

DNA and Digital Information

DNA (deoxyribonucleic acid) is the molecule of life, but it is also, fundamentally, a storage medium for digital information. The four nucleotide bases — Adenine (A), Cytosine (C), Guanine (G), and Thymine (T) — can be thought of as a 4-symbol alphabet. Two bits can represent four states (00, 01, 10, 11), so each DNA base encodes exactly 2 bits of information. This means a DNA strand can encode digital data just as efficiently as binary, with vastly greater density than any electronic storage medium.

This tool uses a simple mapping: A=00, C=01, G=10, T=11. Each character (byte = 8 bits) requires 4 bases in this scheme. For readability, the output is displayed in triplet codons separated by spaces, and the bases are colour-coded (green=A, blue=C, amber=G, red=T).

The Genetic Code

In biology, the genetic code is the set of rules by which a cell translates information encoded in the DNA (or RNA) sequence of a gene into a sequence of amino acids. A codon is a sequence of three bases. With four possible bases, there are 4^3 = 64 possible codons. These 64 codons map to 20 amino acids and 3 stop signals. The code is nearly universal — with minor exceptions in mitochondria and some microorganisms, all life on Earth uses the same codon table, a strong piece of evidence for the common descent of all known life.

DNA Data Storage

DNA data storage has moved from theoretical concept to practical demonstration. In 2012, George Church's lab at Harvard stored 5.27 megabits of data in synthetic DNA. Microsoft Research has demonstrated retrieval of specific files from a pool of millions of DNA molecules using PCR-based addressing. The theoretical density of DNA storage is approximately 1 exabyte (10^18 bytes) per cubic centimeter — more than a million times denser than a solid-state drive. DNA also survives for thousands of years in cold, dry conditions, as demonstrated by the successful extraction of genetic information from woolly mammoth bones over 40,000 years old.

Encoding Schemes

This tool uses a simple 2-bit per base encoding. More sophisticated DNA data storage schemes use error correction codes to handle synthesis and sequencing errors, and they deliberately avoid runs of identical bases (homopolymers) that are difficult to synthesize and sequence accurately. The Grass et al. (2015) paper introduced Reed-Solomon error correction into DNA storage, enabling recovery of data even when up to 10% of bases are corrupted.

Frequently Asked Questions

A codon is a sequence of three DNA bases that codes for a specific amino acid or acts as a stop signal. There are 64 possible codons (4^3) mapping to 20 amino acids and 3 stop codons.

Each base maps to 2 bits (A=00, C=01, G=10, T=11). One byte (8 bits) requires 4 bases. The output is grouped in triplets for readability. This is a steganographic encoding, not a biological standard.

Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). In RNA, Uracil (U) replaces Thymine. A pairs with T, and C pairs with G.

The standard genetic code maps 64 codons to 20 amino acids and 3 stop codons (TAA, TAG, TGA). ATG codes for Methionine (the start codon). The code is nearly universal across all life on Earth.

Yes. DNA data storage is an active research field. Scientists have stored megabytes of data in synthetic DNA strands. DNA offers extraordinary density — theoretically 1 exabyte per cubic centimeter — and can last thousands of years in cold, dry conditions.