DNA Sequence Encoder
Encode text to DNA codon sequences (A/C/T/G), decode DNA back to text, or translate codons to amino acids.
How to Use the DNA Encoder
- Text→DNA — type any text. Each character is encoded as a sequence of A/C/G/T bases using a 2-bit per base scheme (6 bases per character).
- DNA→Text — paste a DNA sequence to decode it back to the original text.
- Amino Acids — input any DNA/RNA sequence and translate it using the standard genetic codon table (triplet codons → amino acid single-letter codes).
- Codon Table — full reference of all 64 codons and their amino acid mappings.
DNA and Digital Information
DNA (deoxyribonucleic acid) is the molecule of life, but it is also, fundamentally, a storage medium for digital information. The four nucleotide bases — Adenine (A), Cytosine (C), Guanine (G), and Thymine (T) — can be thought of as a 4-symbol alphabet. Two bits can represent four states (00, 01, 10, 11), so each DNA base encodes exactly 2 bits of information. This means a DNA strand can encode digital data just as efficiently as binary, with vastly greater density than any electronic storage medium.
This tool uses a simple mapping: A=00, C=01, G=10, T=11. Each character (byte = 8 bits) requires 4 bases in this scheme. For readability, the output is displayed in triplet codons separated by spaces, and the bases are colour-coded (green=A, blue=C, amber=G, red=T).
The Genetic Code
In biology, the genetic code is the set of rules by which a cell translates information encoded in the DNA (or RNA) sequence of a gene into a sequence of amino acids. A codon is a sequence of three bases. With four possible bases, there are 4^3 = 64 possible codons. These 64 codons map to 20 amino acids and 3 stop signals. The code is nearly universal — with minor exceptions in mitochondria and some microorganisms, all life on Earth uses the same codon table, a strong piece of evidence for the common descent of all known life.
DNA Data Storage
DNA data storage has moved from theoretical concept to practical demonstration. In 2012, George Church's lab at Harvard stored 5.27 megabits of data in synthetic DNA. Microsoft Research has demonstrated retrieval of specific files from a pool of millions of DNA molecules using PCR-based addressing. The theoretical density of DNA storage is approximately 1 exabyte (10^18 bytes) per cubic centimeter — more than a million times denser than a solid-state drive. DNA also survives for thousands of years in cold, dry conditions, as demonstrated by the successful extraction of genetic information from woolly mammoth bones over 40,000 years old.
Encoding Schemes
This tool uses a simple 2-bit per base encoding. More sophisticated DNA data storage schemes use error correction codes to handle synthesis and sequencing errors, and they deliberately avoid runs of identical bases (homopolymers) that are difficult to synthesize and sequence accurately. The Grass et al. (2015) paper introduced Reed-Solomon error correction into DNA storage, enabling recovery of data even when up to 10% of bases are corrupted.