URL Decoding

To complement the code of my URL Encoding post, I've now developed a URIDecode routine.

It attempts to decode URIs that were percent-encoded according to RFC 3986. It also allows for some malformed percent-encoded URIs, i.e. those that contain characters outside the RFC's "unreserved" character set.

Here's the code. An explanation follows.

function URIDecode(const Str: string): string;

  // Counts number of '%' characters in a UTF8 string
  function CountPercent(const S: UTF8String): Integer;
    Idx: Integer; // loops thru all octets of S
    Result := 0;
    for Idx := 1 to Length(S) do
      if S[Idx] = cPercent then

  SrcUTF8: UTF8String;  // input string as UTF-8
  SrcIdx: Integer;      // index into source UTF-8 string
  ResUTF8: UTF8String;  // output string as UTF-8
  ResIdx: Integer;      // index into result UTF-8 string
  Hex: string;          // hex component of % encoding
  ChValue: Integer;     // character ordinal value from a % encoding
  // Convert input string to UTF-8
  SrcUTF8 := UTF8Encode(Str);
  // Size the decoded UTF-8 string
  SetLength(ResUTF8, Length(SrcUTF8) - 2 * CountPercent(SrcUTF8));
  SrcIdx := 1;
  ResIdx := 1;
  // Process each octet of the source string
  while SrcIdx <= Length(SrcUTF8) do
    if SrcUTF8[SrcIdx] = cPercent then
      // % encoding: decode following two hex chars into required code point
      if Length(SrcUTF8) < SrcIdx + 2 then
        raise EConvertError.Create(rsEscapeError);  // malformed: too short
      Hex := '$' + string(SrcUTF8[SrcIdx + 1] + SrcUTF8[SrcIdx + 2]);
      if not TryStrToInt(Hex, ChValue) then
        raise EConvertError.Create(rsEscapeError);  // malformed: not valid hex
      ResUTF8[ResIdx] := AnsiChar(ChValue);
      Inc(SrcIdx, 3);
      // plain char or UTF-8 continuation character: copy unchanged
      ResUTF8[ResIdx] := SrcUTF8[SrcIdx];
  // Convert back to native string type for result
  Result := UTF8ToString(ResUTF8);

Internally, URIDecode operates on UTF-8 strings for both input and output.

This lets us deal easily with any multi-byte characters in the input. As already noted, there shouln't be any such characters - all should map onto the unreserved characters that form a subset of the ASCII character set. However we need to allow for badly encoded URIs that may contain characters outside this expected set.

UTF-8 also lets perform an easy test for '%' characters in input. Since '%' can never occur as a UTF-8 continuation character we can simply test for the actual character without worrying about if it is part of of a multibyte character.

We use UTF-8 for the output string since UTF-8 should have been used to encode the URI in the first place, therefore percent-encoded octets may map onto UTF-8 continuation bytes. Using any other string type would give erroneous results.

The interim UTF-8 result is converted into the native string type before returning.

URIDecode has been added to UURIEncode.pas in my Delphi Doodlings repo. View the code.


  1. Hi, thank you so much :)

  2. Hi, I've tried it for decoding :
    Works perfectly !
    Thank you !


Post a Comment

Comments are very welcome, but please don't comment here if:

1) You have a query about, or a bug report for, one of my programs or libraries. Most of my posts contain a link to the relevant repository where there will be an issue tracker you can use.

2) You have a query about any 3rd party programs I feature, please address them to the developer(s) - there will be a link in the post.

3) You're one of the tiny, tiny minority who are aggressive or abusive - in the bin you go and reported you will be!


Popular posts from this blog

Initialising dynamic arrays

Deleting elements from a dynamic array