String Length Calculator

Characters, bytes (UTF-8/UTF-16/ASCII), codepoints, graphemes, URL-encoded length, and Base64 length — all at once.

Input String
Type or paste a string above to measure its length.

How to Use the String Length Calculator

  1. Paste your string into the input area. Metrics update instantly as you type.
  2. Summary mode — shows all key length metrics at a glance: characters, bytes (UTF-8, UTF-16, ASCII), codepoints, graphemes, URL-encoded length, and Base64 length.
  3. Encoding mode — adds a per-character breakdown table showing each character's codepoint and UTF-8 byte sequence.
  4. Compare mode — paste two strings side by side and see a difference table for all metrics.

Understanding String Length Metrics

Character Count vs. Grapheme Count

Most people want the grapheme count when they ask "how many characters is this?" — the number of symbols a human would count by pointing at each one. This is distinct from JavaScript's string.length, which counts UTF-16 code units, and from the Unicode codepoint count, which counts atomic Unicode values. A family emoji like 👨‍👩‍👧 is 1 grapheme, 8 Unicode codepoints, and 16 UTF-16 code units. This tool shows all three counts so you know exactly what each API or database field will measure.

UTF-8 Byte Count

UTF-8 is the standard encoding for web content, JSON, and most databases. It uses a variable number of bytes: 1 byte for ASCII characters (U+0000–U+007F), 2 bytes for characters with diacritics and many European scripts (U+0080–U+07FF), 3 bytes for most CJK (Chinese, Japanese, Korean) characters and common symbols (U+0800–U+FFFF), and 4 bytes for emoji and supplementary characters (U+10000+). The UTF-8 byte count is what determines database column size requirements when using VARCHAR or TEXT columns with byte limits.

UTF-16 Byte Count

UTF-16 is the internal string format used by JavaScript, Java, C#, and Windows APIs. It uses 2 bytes for most characters (the Basic Multilingual Plane) and 4 bytes for supplementary characters (emoji, historic scripts). UTF-16 is common in Windows file system APIs and .NET string handling. Knowing the UTF-16 byte count is useful when working with Windows native APIs, Java's String.length(), or .NET's string.Length, all of which count UTF-16 code units.

ASCII Byte Count

ASCII-compatible byte count assumes the string will be passed through a system that only supports 7-bit ASCII. Non-ASCII characters are counted as they would be encoded in ASCII with a percent-encoding scheme — each non-ASCII byte as 3 bytes (%XX). This approximates the length overhead of transmitting the string through protocols that require ASCII encoding. Note that true ASCII does not support non-ASCII characters — this metric shows what the length would be if the string were URL-encoded (encodeURIComponent).

URL-Encoded Length

URL encoding (percent-encoding) converts each non-ASCII and reserved character to a %XX sequence. The URL-encoded length is the length of the string after encodeURIComponent() is applied. This is the relevant length for query string parameters, path segments, and form data submitted via GET. Most browsers support URLs up to about 2,000 characters, though the HTTP spec has no defined limit. If your URL-encoded string approaches 2,000 characters, consider using POST instead of GET.

Base64 Length

Base64 encodes binary data (or UTF-8 text) as ASCII characters using 4 base-64 characters for every 3 bytes, with padding to make the output length a multiple of 4. The Base64 length of a string is approximately ceil(utf8_bytes / 3) * 4. Base64 is commonly used in data URIs (embedding images in CSS), JSON Web Tokens (JWT), and HTTP Basic Authentication headers. Knowing the Base64 length helps when you need to check if a Base64-encoded value will fit within a database column, cookie size limit, or API field length restriction.

Compare Mode

Compare mode is useful for debugging encoding mismatches between two versions of the same string, checking whether a field was trimmed or modified by a system, or verifying that two strings that look identical are truly identical at the byte level. Paste both strings and the tool shows the difference in every length metric. If two strings look the same but have different byte counts, there is likely a hidden character (zero-width space, non-breaking space, or BOM) in one of them — use the Zero-Width Character Detector or Whitespace Visualizer to find it.

Frequently Asked Questions

Character count is the number of Unicode codepoints or grapheme clusters in the string. Byte count is the number of bytes needed to store the string in a specific encoding. In UTF-8, ASCII characters take 1 byte, diacritics take 2 bytes, CJK characters take 3 bytes, and emoji take 4 bytes. A string of 10 emoji has a character count of 10 but a UTF-8 byte count of 40.
JavaScript strings are sequences of UTF-16 code units. Emoji outside the Basic Multilingual Plane are encoded as surrogate pairs — two UTF-16 code units each. The .length property counts code units, not characters, so a single emoji reports as length 2. This tool shows both the code unit count (what JavaScript sees) and the grapheme count (what humans perceive).
UTF-8 uses 1-4 bytes per character depending on the codepoint value. UTF-16 uses 2 bytes per BMP character and 4 bytes for characters above U+FFFF (most emoji). For ASCII text, UTF-8 is more compact. For CJK-heavy text they are similar. UTF-16 is used internally by JavaScript, Java, C#, and Windows APIs.
URL encoding converts non-ASCII and special characters to %XX sequences. The URL-encoded length is the length of the string after encodeURIComponent() is applied — useful for checking if a string fits within URL length limits (typically 2,000 characters in browsers). Each non-ASCII byte becomes 3 characters (%XX), so emoji can expand from 1 character to 12+ in a URL.
A grapheme cluster is the smallest unit of text that a user perceives as a single character. It may consist of multiple Unicode codepoints — a base letter plus a combining diacritic, an emoji with a skin tone modifier, or a flag emoji. The grapheme count determines how many characters appear in a rendered text box and is what most people mean when they say "character count."