How does zero-width steganography work?

The technique converts each character of the secret message into its binary representation, then maps 0 bits to Zero Width Space (U+200B) and 1 bits to Zero Width Non-Joiner (U+200C). These invisible characters are then inserted between the characters of the visible carrier text. The recipient can extract the hidden message by filtering out only the zero-width characters and reversing the binary mapping.

Can zero-width characters be detected?

Yes. Zero-width characters can be detected by inspecting the raw bytes of the text, using a Unicode property inspector (like our inspector tool), or with regex patterns that match Unicode category Cf (Format characters). Security tools and document analysis software routinely scan for hidden zero-width characters. However, casual readers looking at rendered text will not notice them.

What is Unicode steganography used for?

Legitimate uses include watermarking leaked documents to identify the source of leaks, adding provenance information to text snippets, and academic research into information hiding techniques. Malicious uses include bypassing content filters (since zero-width characters are invisible but alter string comparisons) and embedding covert channels in plain text communication.

Does encoding change the visible text?

No. The visible text appears completely unchanged after encoding. The hidden message is encoded entirely in the zero-width characters inserted between the visible characters. The only way to detect the encoding without a tool like this is to examine the raw bytes or character count, which will be longer than expected.

Zero-Width Hidden Message Encoder

Hide a secret message inside visible text using invisible zero-width Unicode characters. Decode any text to reveal hidden messages.

Visible carrier text

Secret message to hide

Output (looks identical — but contains hidden message)

Binary representation of hidden message

Binary encoding will appear here after encoding…

Step 1 — Convert secret message to binary

Each character in the secret message is converted to its UTF-8 byte representation, then expressed as a sequence of 8-bit binary values. For example, the letter A (ASCII 65, or 0x41) becomes 0 1 0 0 0 0 0 1.

Step 2 — Map binary to zero-width characters

Each binary digit is replaced with an invisible Unicode character:

Bit 0→ZWSU+200B (Zero Width Space) Bit 1→ZWNJU+200C (Zero Width Non-Joiner)

The letter A (01000001) becomes: ZWSZWNJZWSZWSZWSZWSZWSZWNJ

Step 3 — Insert between carrier text characters

The zero-width sequence is distributed between characters of the visible carrier text. Since zero-width characters have no visible width, the carrier text appears completely unchanged to any reader. The complete hidden message is interleaved throughout the carrier.

Step 4 — Decode by extracting zero-width characters

To recover the hidden message, the decoder scans the text for any ZWS (U+200B) or ZWNJ (U+200C) characters, collects them in order, groups them into 8-bit chunks, converts each chunk back to a decimal value, and interprets the values as UTF-8 text.

Capacity

Each character of the secret message requires 8 zero-width characters. So a 10-character message needs 80 zero-width characters. These are distributed between characters of the carrier text. The carrier must be at least as long as the zero-width sequence to allow interleaving, though all zero-width characters can also be appended to the end.

Detection

This encoding can be detected by examining raw bytes, checking character counts, or using a Unicode property inspector. Security tools and document analysis software can detect zero-width characters. This tool is provided for educational and legitimate watermarking purposes only.

Enter carrier text and a secret message, then click Encode.

What Are Zero-Width Characters?

Zero-width characters are Unicode code points that have no visible rendering — they take up no horizontal space in text. The most important ones are Zero Width Space (U+200B), Zero Width Non-Joiner (U+200C), Zero Width Joiner (U+200D), and Zero Width No-Break Space (U+FEFF, also the Byte Order Mark). While these characters were designed for legitimate typographic purposes — ZWJ joins emoji components, ZWNJ prevents unwanted ligatures in scripts like Devanagari — their invisibility makes them usable for information hiding.

Unicode Steganography

Text steganography using zero-width characters exploits the fact that these characters are invisible to readers but present in the underlying byte sequence. By mapping the binary representation of a secret message to sequences of zero-width characters and inserting them into ordinary text, a hidden message can be embedded in plain sight. The technique works in any medium that preserves the exact Unicode codepoints: email, social media posts, documents, and web pages all transmit zero-width characters without modification.

Legitimate Use Cases

The most practical legitimate application of zero-width steganography is document watermarking. A confidential document distributed to multiple recipients can contain a unique hidden message for each recipient. If the document is leaked, the watermark identifies the source. This technique is used by intelligence agencies, law firms, financial institutions, and any organization that needs to track document provenance without visible markings. The hidden message can include a timestamp, recipient identifier, or any other provenance information.

Security Implications

Zero-width characters pose several security risks beyond steganography. They can be used to bypass content filters and keyword detection systems, since the string "hello" and "hello" look identical but are different byte sequences. Phishing attacks can use zero-width characters to make malicious text pass filters while appearing legitimate to users. In code review, adversarial invisible characters have been used to make malicious code appear as benign no-ops. The Trojan Source attack demonstrated how bidirectional Unicode control characters (including invisible ones) can make source code appear different from what the compiler actually executes.

Detecting Zero-Width Characters

To detect zero-width characters in text, use our Unicode Character Inspector — it shows every character including invisible ones. In JavaScript, you can test for common zero-width characters with a regex: /[\u200B\u200C\u200D\uFEFF]/. For more thorough detection, check for any character with Unicode category Cf (Format character) or Cs (Surrogate). Text editors like VS Code show zero-width characters as highlighted invisible glyphs when configured to display whitespace. The wc -c command on Unix will show a byte count higher than expected if zero-width characters are present.

Encoding Capacity and Limitations

The binary encoding used here represents each secret character as 8 bits, requiring 8 zero-width characters per byte. For ASCII text (1 byte per character), a 100-character secret message requires 800 zero-width characters. For Unicode text with multi-byte characters, the requirement is higher. While zero-width characters are invisible, their presence can be inferred from an unusually large file size or character count. More advanced steganographic schemes use multiple zero-width character types to encode higher bit densities or employ error-correcting codes for robustness.

Frequently Asked Questions

Zero-width characters are Unicode code points with no visible width when rendered. Key ones include Zero Width Space (U+200B), Zero Width Non-Joiner (U+200C), Zero Width Joiner (U+200D), and Zero Width No-Break Space (U+FEFF). They are invisible to readers but present in the raw byte sequence of the text.

Each character of the secret message is converted to 8-bit binary. Bit 0 maps to Zero Width Space (U+200B) and bit 1 maps to Zero Width Non-Joiner (U+200C). These invisible characters are inserted between characters of the visible carrier text. To decode, extract all ZWS/ZWNJ characters, group into 8-bit chunks, and convert back to text.

Yes. Zero-width characters can be detected by examining the raw bytes or character count of the text, using a Unicode inspector, or with regex patterns matching Unicode category Cf. The text will have more bytes than visible characters. Security tools, document analysis software, and careful manual inspection can all reveal hidden zero-width characters.

Legitimate uses include watermarking documents to identify the source of leaks, adding invisible provenance metadata to text, and security research. Malicious uses include bypassing content filters (since invisible characters alter string comparisons without being visible), embedding covert communication channels, and the Trojan Source attack on source code. This tool is for educational and legitimate purposes only.

No. The visible text appears completely unchanged. The hidden message is encoded entirely in zero-width characters inserted between visible characters. The only detectable differences are a longer byte sequence, a higher character count, and potential issues with text processing that does not expect non-printable characters.