How do I type a Unicode character by codepoint?

On Windows, hold Alt and type the decimal value on the numeric keypad (for characters below U+00FF). For any codepoint in most apps, type the hex codepoint followed by Alt+X (e.g., '2665' then Alt+X gives ♥). On macOS, enable Unicode Hex Input keyboard and hold Option while typing the hex codepoint.

What is the difference between UTF-8, UTF-16, and UTF-32?

These are three different ways to encode Unicode codepoints as bytes. UTF-8 uses 1–4 bytes per character and is the dominant encoding for web content. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript and Windows. UTF-32 always uses 4 bytes per character, making it simple but memory-inefficient.

Unicode Character Search

Search characters by name, hex code, decimal, or browse by block and category. Click any character to copy it.

Type a search term or select a block to browse Unicode characters.

How to Use the Unicode Character Search

Search mode — type a character name (e.g., "arrow"), a hex codepoint (e.g., "U+2192" or "2192"), or a decimal number. Results update as you type.
By Block — select a Unicode block from the dropdown to browse all characters in that range.
By Category — filter by Unicode general category such as Uppercase Letter, Math Symbol, or Other Punctuation.
Copy a character — click any character card to copy it to your clipboard instantly.
Grid / List view — toggle between a compact grid and a detailed list that shows the full character name.

What Is Unicode?

Unicode is an international encoding standard that assigns a unique number — called a codepoint — to every character used in human writing systems, plus thousands of symbols, emoji, and technical characters. The Unicode standard currently defines over 149,000 characters spanning more than 160 scripts, from Ancient Egyptian hieroglyphs to modern emoji. Before Unicode, different computer systems used incompatible character encodings, making cross-platform text exchange error-prone. Unicode solved this by providing a single, universal character set that every modern operating system, browser, and application can use.

Unicode Blocks and Planes

The Unicode codespace is organised into 17 planes, each containing 65,536 codepoints. The first plane — the Basic Multilingual Plane (BMP, U+0000–U+FFFF) — contains virtually all characters needed for modern text, including Latin, Greek, Cyrillic, Arabic, Hebrew, Chinese, Japanese, and Korean characters. The supplementary planes (U+10000 and above) contain historic scripts, musical notation, mathematical alphanumerics, and the full emoji set. Within each plane, codepoints are grouped into named blocks that make navigation easier. For example, the Arrows block (U+2190–U+21FF) contains 112 arrow symbols, while the Box Drawing block (U+2500–U+257F) contains the line-drawing characters used in terminal UIs.

General Categories

Every Unicode character is assigned a general category that describes its broad type. The major categories are: Letter (L), Mark (M), Number (N), Punctuation (P), Symbol (S), Separator (Z), and Other (C). These are further subdivided — for example, Letter splits into Uppercase Letter (Lu), Lowercase Letter (Ll), Titlecase Letter (Lt), Modifier Letter (Lm), and Other Letter (Lo). Understanding categories is important for text processing: regular expressions use Unicode categories to match character classes, and bidirectional text algorithms rely on them to determine text direction.

Encoding: UTF-8, UTF-16, and UTF-32

Unicode defines codepoints, but encoding standards define how those numbers are stored as bytes. UTF-8 encodes ASCII characters as single bytes and uses 2–4 bytes for higher codepoints, making it backward-compatible with ASCII and ideal for web content. UTF-16 uses 2 bytes for characters in the BMP and 4 bytes (surrogate pairs) for supplementary characters — this is the internal encoding used by JavaScript strings, Java, and the Windows API. UTF-32 always uses 4 bytes per character, making random access by codepoint index simple but doubling or quadrupling storage compared to UTF-8 for ASCII-heavy text. When copying characters from this tool, they are placed on your clipboard in your system's native encoding.

Using Unicode in Code

Most programming languages support Unicode escape sequences. In JavaScript, you can write '\u2192' for a rightward arrow (→) or '\u{1F600}' for an emoji using the ES6 extended escape. In Python 3, all strings are Unicode by default and you can use '\u2192' or '\N{RIGHTWARDS ARROW}'. In CSS, use content: '\2192' for pseudo-element content. HTML entities work too: → (hex) or → (decimal). This tool shows all these representations for every character, making it easy to copy the right format for your use case.

Common Use Cases

Developers use Unicode search to find arrow symbols for documentation, box-drawing characters for terminal applications, mathematical operators for formula rendering, and special punctuation for typography. Designers look for decorative symbols, dingbats, and geometric shapes. Security researchers use it to identify homoglyphs — characters that look identical to ASCII letters but have different codepoints — which are used in phishing attacks and typosquatting domains. Content writers use it to insert special punctuation like em dashes (—), non-breaking spaces, and typographic quotes that are not available on standard keyboard layouts.

Frequently Asked Questions

A Unicode codepoint is a unique number assigned to every character in the Unicode standard. Codepoints are written as U+ followed by 4–6 hex digits. For example, U+0041 is the Latin capital letter A, and U+1F600 is the grinning face emoji. There are over 1.1 million possible codepoints, of which about 149,000 are currently assigned.

On Windows, in most applications you can type the hex codepoint (e.g., 2665) followed by Alt+X to insert the character (♥). On macOS, enable the Unicode Hex Input keyboard layout, then hold Option while typing the 4-digit hex code. Alternatively, use this tool to find and copy any character with a single click.

Both are ways of encoding Unicode codepoints as bytes. UTF-8 uses 1–4 bytes per character and is backward-compatible with ASCII. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows. UTF-8 is preferred for web content because it is more compact for ASCII-heavy text and has no byte-order ambiguity.

A Unicode block is a contiguous range of codepoints assigned to a particular script or symbol group. For example, Basic Latin covers U+0000–U+007F, Greek covers U+0370–U+03FF, and Emoticons covers U+1F600–U+1F64F. There are currently 331 named blocks in Unicode 15.

Yes. The Unicode Search tool runs entirely in your browser. No text you type or characters you copy are sent to any server. The character database is embedded in the page JavaScript itself.