Question 1

What is the difference between character count and byte count?

Accepted Answer

Character count is the number of Unicode codepoints or grapheme clusters in the string. Byte count is the number of bytes needed to store the string in a specific encoding. In UTF-8, ASCII characters take 1 byte each, characters with diacritics take 2 bytes, most CJK characters take 3 bytes, and emoji take 4 bytes. A string of 10 emoji has a character count of 10 but a UTF-8 byte count of 40.

Question 2

Why does JavaScript report emoji as having length 2?

Accepted Answer

JavaScript strings are sequences of UTF-16 code units. Emoji and other characters outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs — two UTF-16 code units each. The .length property counts code units, not characters, so a single emoji reports as length 2. This tool shows both the code unit count (what JavaScript sees) and the grapheme count (what humans perceive).

Question 3

What is the difference between UTF-8 and UTF-16 byte counts?

Accepted Answer

UTF-8 uses 1-4 bytes per character depending on the codepoint value. ASCII characters (U+0000–U+007F) use 1 byte, characters U+0080–U+07FF use 2 bytes, U+0800–U+FFFF use 3 bytes, and characters above U+FFFF (including most emoji) use 4 bytes. UTF-16 uses 2 bytes per BMP character and 4 bytes for characters above U+FFFF. For ASCII text, UTF-8 is more compact; for CJK-heavy text, they are similar in size.

Question 4

What does URL-encoded length mean?

Accepted Answer

URL encoding (percent-encoding) converts non-ASCII and special characters to %XX sequences where XX is the hex value of each UTF-8 byte. Each reserved character becomes 3 characters (%XX). The URL-encoded length shown is the length of the string after encodeURIComponent() is applied — useful for checking if a string will fit within URL length limits (browsers typically support 2000-8000 characters in URLs).

Question 5

What is a grapheme cluster?

Accepted Answer

A grapheme cluster is the smallest unit of text that a user perceives as a single character. It may consist of multiple Unicode codepoints — for example, a base letter plus a combining diacritic, an emoji with a skin tone modifier, or a complex script sequence. The grapheme count is what most people mean when they say 'character count' in a user-facing context, and it determines how many characters appear in a rendered text box.

String Length Calculator

Per-Character Encoding Breakdown

How to Use the String Length Calculator

Understanding String Length Metrics

Character Count vs. Grapheme Count

UTF-8 Byte Count

UTF-16 Byte Count

ASCII Byte Count

URL-Encoded Length

Base64 Length

Compare Mode

Frequently Asked Questions