Homoglyph Detector

Detect Unicode look-alike characters in text, URLs, and emails. Generate confusable strings for security testing. Compare visually identical strings.

Input text, URL, or email address

Suspicious characters are highlighted in yellow below. Click any highlighted character for details.

Paste text, a URL, or email address to scan for confusable characters.

What Is a Homoglyph Attack?

A homoglyph attack (also called a homograph attack or Unicode spoofing) exploits the visual similarity between characters from different scripts. The most common attack vector is domain name spoofing: an attacker registers a domain that looks identical to a legitimate domain but uses Cyrillic, Greek, or other Unicode characters instead of ASCII letters. For example, the Cyrillic letter "а" (U+0430) is indistinguishable from the Latin letter "a" (U+0061) in most fonts. A domain like pаypal.com using the Cyrillic а would look exactly like paypal.com to most users.

Common Confusable Character Pairs

The most frequently exploited homoglyphs are cross-script lookalikes between Latin and other alphabets. Latin "a" (U+0061) is confused with Cyrillic "а" (U+0430) and Greek "α" (U+03B1). Latin "c" (U+0063) looks identical to Cyrillic "с" (U+0441). Latin "e" (U+0065) is confused with Cyrillic "е" (U+0435). Latin "o" (U+006F) resembles Cyrillic "о" (U+043E), Greek "ο" (U+03BF), and Armenian "օ" (U+0585). Latin "p" (U+0070) looks like Cyrillic "р" (U+0440). Latin "x" (U+0078) resembles Cyrillic "х" (U+0445). These pairings are documented in the official Unicode confusables.txt data file maintained by the Unicode Consortium.

How to Detect Homoglyph Attacks

The primary detection method is checking for mixed-script identifiers — strings that contain characters from more than one Unicode script. A domain like "pаypal.com" that mixes Latin and Cyrillic is a strong indicator of a spoofing attempt. Modern browsers display such domains in Punycode (xn-- notation) to warn users. At the application level, you can normalize strings using Unicode NFKC normalization and compare codepoints, or use the Unicode Consortium's confusables data to check every character against known lookalike mappings. This tool implements a subset of the most dangerous confusable pairs for quick detection.

Protecting Against Homoglyph Attacks

For web applications, always display the Punycode representation of domain names when they contain non-ASCII characters. For user input validation, reject or warn when identifiers contain mixed scripts. Implement email authentication (DMARC, DKIM, SPF) to prevent email spoofing. For brand protection, register defensive IDN variants of your domain. ICANN and major registrars have policies restricting IDN registrations that confuse with existing well-known domains, but coverage is incomplete. Certificate transparency logs (crt.sh) can be monitored for suspicious certificates issued for lookalike domains.

Security Research Applications

The Generate mode in this tool allows security researchers to create homoglyph variants of strings for testing purposes. This is useful for testing email filters, URL scanners, domain name validation logic, and phishing detection systems. When building these defenses, it is important to test with a comprehensive set of confusables rather than just the most obvious Latin-Cyrillic pairs. The Unicode Consortium's confusables.txt lists thousands of character pairs across all scripts.

Zero-Width and Invisible Characters

In addition to look-alike characters, attackers also use invisible Unicode characters like Zero Width Space (U+200B), Zero Width Non-Joiner (U+200C), and the Soft Hyphen (U+00AD) to create strings that look identical visually but differ at the byte level. These are especially dangerous in URLs, where they can bypass regex-based blacklists, and in email addresses, where they can defeat spam filters. Our Zero-Width Encoder tool demonstrates this technique.

Frequently Asked Questions

A homoglyph attack uses Unicode characters that look identical to ASCII letters to impersonate legitimate domain names, email addresses, or text. For example, the Cyrillic "а" (U+0430) is visually identical to the Latin "a" (U+0061). An attacker can register pаypal.com using Cyrillic а to create a phishing domain that looks exactly like paypal.com.
Modern browsers display internationalized domain names (IDNs) in Punycode notation (e.g., xn--pypal-4ve.com) when the domain mixes scripts or contains characters that could be confused with ASCII. Chrome, Firefox, and Safari all have heuristics to detect confusable domains. However, all-Cyrillic or all-Greek domains that don't mix scripts may still display in Unicode, so the protection is not complete.
The Unicode Consortium maintains an official confusables.txt data file listing character pairs that can be visually confused. This includes Cyrillic-Latin pairs (а/a, с/c, е/e, о/o), Greek-Latin pairs (ο/o, α/a), and many others across all scripts. The official list contains thousands of pairs and is used by security tools and browsers.
Register defensive IDN variants of your domain. Implement DMARC, DKIM, and SPF for email authentication. Monitor certificate transparency logs (crt.sh) for suspicious lookalike certificates. Train users to check URLs carefully. Major registrars block known confusable registrations for popular brands, but coverage is incomplete for smaller brands.
Generating homoglyph strings is a legitimate security research technique used by penetration testers and developers. This tool is designed for testing defenses and understanding the attack surface. Using homoglyphs to actually deceive users — registering phishing domains, sending deceptive emails — is illegal in most jurisdictions and violates ICANN policies.