Zero-Width Character Detector

Find, highlight, and remove invisible zero-width Unicode characters — ZWS, ZWNJ, ZWJ, BOM, and more.

Input
Detected Characters
Paste text above to detect zero-width characters.

How to Use the Zero-Width Character Detector

  1. Paste your text into the input area on the left. Zero-width characters are invisible, so text containing them looks normal.
  2. Detect mode — the tool highlights every zero-width character with a red badge showing its Unicode code point, and lists all findings below the output with position and name.
  3. Remove mode — strips all zero-width characters and outputs clean text ready to copy or download.
  4. Insert mode — adds zero-width characters to your text for line-break control or document watermarking.

What Are Zero-Width Characters?

Zero-width characters are Unicode code points that occupy no horizontal space when rendered. They are invisible in virtually all fonts and rendering environments, making them ideal for both legitimate typographic purposes and malicious obfuscation. The most common zero-width characters are: Zero-Width Space (U+200B), which allows line breaks within words; Zero-Width Non-Joiner (U+200C), which prevents character joining in Arabic and Indic scripts; Zero-Width Joiner (U+200D), which causes adjacent characters to join (used in emoji sequences like family emojis); Byte Order Mark (U+FEFF, also Zero-Width No-Break Space); and Word Joiner (U+2060), which prevents line breaks without adding space.

Security Implications

Zero-width characters have become a significant security concern in several contexts. In phishing attacks, they are inserted into domain names and URLs to make them look identical to legitimate addresses while actually pointing to different servers. In social engineering, they can be inserted into names, keywords, and sensitive strings to bypass automated keyword filters and spam detectors. In content theft detection, they can be removed to strip watermarks that were embedded to track document leaks. This tool helps security researchers and content creators audit their text for these hidden characters.

Zero-Width Characters in Programming

Zero-width characters in source code are a particularly dangerous attack vector. A malicious code review contribution could insert ZWC characters into string literals, variable names, or comments to create code that looks correct in a code review but behaves differently than expected. For example, a Zero-Width Joiner between two characters in a string comparison can make the string never match its expected value. Some IDEs and code editors do not highlight these characters by default, making them hard to detect during review. Always run untrusted code contributions through a zero-width character detector before merging.

Insert Mode: Watermarking with Zero-Width Characters

The Insert mode allows you to embed zero-width characters into text — a technique used for document fingerprinting and leak detection. The idea is simple: different combinations of ZWS (U+200B) and ZWNJ (U+200C) can be used to encode binary data. By assigning different zero-width character patterns to different recipients of a sensitive document, the document's origin can be traced if it leaks, even after formatting changes or copy-paste operations that preserve the invisible characters. Note that sophisticated actors can detect and strip these watermarks with tools like this one, so zero-width watermarking is most effective against unsophisticated leakers.

Detected Character Reference

This tool detects the following zero-width and invisible Unicode characters: U+200B Zero-Width Space, U+200C Zero-Width Non-Joiner, U+200D Zero-Width Joiner, U+FEFF Byte Order Mark (BOM) / Zero-Width No-Break Space, U+2060 Word Joiner, U+2061 Function Application, U+2062 Invisible Times, U+2063 Invisible Separator, U+2064 Invisible Plus, U+00AD Soft Hyphen, U+034F Combining Grapheme Joiner, and U+180E Mongolian Vowel Separator.

Common Sources of Zero-Width Characters

  • Copy from Twitter/X — may add ZWS for line-break control in long words
  • Arabic and Indic text — ZWNJ and ZWJ control character joining
  • Emoji sequences — family emojis and flag emojis use ZWJ to combine base characters
  • UTF-8 BOM files — some Windows text editors add a BOM at file start
  • Web scraping — scraped content often contains ZWC from the source page's JavaScript
  • Malicious content — spam, phishing, and filter evasion deliberately insert ZWC

Frequently Asked Questions

Zero-width characters are Unicode code points that take up no visual space in rendered text. They include Zero-Width Space (U+200B), Zero-Width Non-Joiner (U+200C), Zero-Width Joiner (U+200D), Byte Order Mark (U+FEFF), Word Joiner (U+2060), and others. They are often used for language formatting but can also be inserted maliciously to bypass filters or create homograph attacks.
Zero-width characters can be used to bypass spam filters, keyword blocklists, and plagiarism detectors. Attackers insert them into phishing URLs to make them look like legitimate domains. They can also break string comparisons, cause URL routing issues, and embed hidden watermarks in documents without the reader's knowledge.
Insert mode lets you add zero-width spaces at specific positions in your text — useful for controlling line-break behavior in long strings, or for adding watermarks that can later be decoded to identify the source of a leak. You can insert between every character, between words only, or at the end of the text.
A Byte Order Mark (U+FEFF) is a special Unicode character that can appear at the very start of a text file to indicate its encoding. While useful for file encoding detection, a BOM at the start of JSON, CSV, or HTML files can cause parsing errors. This tool detects and removes BOMs along with all other zero-width characters.
Yes. Unicode Steganography uses combinations of zero-width characters to embed binary data invisibly in plaintext. Each recipient of a sensitive document gets a uniquely watermarked version, so if the document leaks, the watermark can be decoded to identify which recipient leaked it. The Insert mode in this tool provides a simple version of this capability.