Dev ToolsMarch 30, 2026

URL Encoding & Decoding Explained: A Developer Reference

By The hakaru Team·Last updated March 2026

Quick Answer

  • *URL encoding (percent-encoding) replaces unsafe characters with %HH hex values so URLs remain valid across all systems.
  • *Use encodeURIComponent for query parameter values; use encodeURI for full URLs.
  • *Non-ASCII characters are first converted to UTF-8 bytes, then each byte is percent-encoded.
  • *Only 66 characters are “unreserved” and never need encoding: A–Z, a–z, 0–9, - . _ ~

What Is URL Encoding?

URL encoding — formally called percent-encoding — is a mechanism defined in RFC 3986 for representing characters that are not allowed or have special meaning in a Uniform Resource Identifier (URI). It replaces each unsafe byte with a percent sign followed by two hexadecimal digits representing the byte's value.

For example, a space character (ASCII 0x20) becomes %20. An ampersand (&) becomes %26. Without encoding, these characters would break the URL's structure because parsers interpret them as delimiters.

Why URL Encoding Exists

URIs were designed with a limited character set. According to RFC 3986 (published by the IETF in January 2005), only 66 characters are classified as “unreserved” and can appear anywhere in a URI without encoding:

  • Uppercase letters: A through Z (26 characters)
  • Lowercase letters: a through z (26 characters)
  • Digits: 0 through 9 (10 characters)
  • Special characters: hyphen (-), period (.), underscore (_), tilde (~) (4 characters)

Everything else — spaces, Unicode characters, reserved delimiters — must be percent-encoded when used as data rather than as structural delimiters. According to the HTTP Archive's 2025 Web Almanac, roughly 38% of all URLs on the web contain at least one percent-encoded character.

Reserved vs Unreserved Characters

RFC 3986 defines 18 reserved characters that serve as URI delimiters:

CharacterEncoded FormPurpose in URLs
:%3AScheme/port separator
/%2FPath segment separator
?%3FQuery string start
#%23Fragment identifier
&%26Query parameter separator
=%3DKey-value pair separator
@%40User info separator
+%2BHistorically represents space in forms

When these characters appear as data inside a path or query value, they must be encoded. When they serve their structural purpose, they remain unencoded.

encodeURI vs encodeURIComponent

JavaScript provides two built-in functions for URL encoding, and choosing the wrong one is among the most common web development mistakes. According to a 2024 Stack Overflow analysis, questions about encoding confusion average over 2,300 views per day.

FunctionEncodesLeaves AloneUse Case
encodeURI()Non-ASCII, spaces: / ? # & = @ + $Encoding a full URL
encodeURIComponent()Everything except A–Z a–z 0–9 - _ . ~Only unreserved charsEncoding a query value or path segment

The rule of thumb: use encodeURIComponent for individual values (query params, path segments) and encodeURI only when you have a complete, well-formed URL that just needs non-ASCII characters escaped.

How UTF-8 Characters Are Encoded

Modern browsers follow RFC 3986 and encode non-ASCII characters using their UTF-8 byte representation. Each byte is then percent-encoded individually. According to the W3C Internationalization guidelines (updated 2024), UTF-8 accounts for 98.2% of all web pages, making it the de facto standard for URL encoding.

CharacterUnicodeUTF-8 BytesEncoded
éU+00E90xC3 0xA9%C3%A9
üU+00FC0xC3 0xBC%C3%BC
U+4E2D0xE4 0xB8 0xAD%E4%B8%AD

A single emoji like 🚀 (U+1F680) requires four UTF-8 bytes and becomes %F0%9F%9A%80— 12 characters to represent one. This is why URLs with heavy Unicode content can get very long.

The Space Character: %20 vs +

Spaces are the most commonly encoded character in URLs, and there are two conventions:

  • %20 — The standard percent-encoding per RFC 3986. Used in path segments and modern APIs.
  • + — The legacy convention from HTML form submissions (application/x-www-form-urlencoded, defined in the HTML spec since 1995). Only valid in query strings.

Google's search URLs, for example, use the + convention: q=url+encoding. Most REST APIs prefer %20. According to the WHATWG URL Standard, parsers must support both in query strings for backward compatibility.

Common Encoding Pitfalls

Double Encoding

Double encoding occurs when an already-encoded string is encoded again. The % in %20 becomes %25, turning %20 into %2520. This is the single most common URL encoding bug. Always determine whether your input is raw data or already encoded before calling an encoding function.

Encoding Entire URLs When You Should Not

Running encodeURIComponent on a full URL like https://example.com/path?q=test will encode the colons, slashes, and question mark, completely breaking the URL. Use encodeURIComponent only on individual values.

Forgetting to Encode in Server-Side Redirects

Concatenating user input into redirect URLs without encoding is both a bug and a security vulnerability. According to OWASP, unvalidated redirects remain in the top 25 most dangerous software weaknesses (CWE-601).

Assuming ASCII-Only URLs

Internationalized Domain Names (IDNs) and non-Latin query parameters are increasingly common. According to ICANN, over 11.7 million IDN domain registrations existed as of Q3 2025. Always handle UTF-8 encoding properly.

Mixing Encoding Standards Across Systems

Different programming languages handle URL encoding slightly differently. Python's urllib.parse.quote uses %20 for spaces by default, while Java's URLEncoder.encode uses +. Always verify encoding behavior at system boundaries.

Encode and decode URLs instantly

Use our free URL Encoder / Decoder →

Frequently Asked Questions

What is the difference between encodeURI and encodeURIComponent?

encodeURI encodes a full URI but leaves reserved characters like :, /, ?, #, and & intact. encodeURIComponent encodes everything except A–Z, a–z, 0–9, and - _ . ~ making it safe for encoding individual query parameter values. Use encodeURIComponent for values; use encodeURI for complete URLs.

Why do spaces become %20 or + in URLs?

RFC 3986 specifies %20 as the percent-encoded form of a space in URLs. The + convention for spaces comes from the application/x-www-form-urlencoded format used in HTML form submissions, which predates the RFC. Both are valid in query strings, but %20 is the standard for path segments.

How are non-ASCII characters like emoji encoded in URLs?

Non-ASCII characters are first converted to their UTF-8 byte sequence, then each byte is percent-encoded. A single emoji can require 4 UTF-8 bytes, resulting in 12 characters of percent-encoding (e.g., 🚀 becomes %F0%9F%9A%80).

Can double-encoding break my URLs?

Yes. Double-encoding happens when an already-encoded string is encoded again, turning %20 into %2520. This is a common bug in web applications. Always check whether your input is already encoded before applying encoding, or decode first then re-encode once.

What characters are safe to use in a URL without encoding?

RFC 3986 defines unreserved characters as A–Z, a–z, 0–9, hyphen (-), period (.), underscore (_), and tilde (~). These 66 characters never need percent-encoding. Reserved characters like : / ? # & = @ must be encoded when used outside their reserved purpose.