URL Encoding
I've being reviewing the URI encoding code from the Code Snippets Database and I realised that it doesn't comply with RFC 3986.
So here's my first attempt at some compliant code.
According to the RFC:
"the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent-encoded."
So, we define the URIEncode function to operate on the UTF8String type. It's easy to encode UnicodeString and AnsiString into UTF using the System unit's UTF8Encode overloaded functions. You can overload URIEncode to do the UTF8 conversion, but I haven't done here.
There's a nice shortcut we can take when url encoding. Remember only unreserved characters are percent-encoded. The set of unreserved characters is:
const cURLUnreservedChars = [ 'A'..'Z', 'a'..'z', '0'..'9', '-', '_', '.', '~' ];
All other characters are percent encoded. But what about any UTF-8 continuation bytes? Well, by definition these have value > $80. And all the unreserved characters have ordinal value < $80. This means that no legal continuation character can be an unreserved character.
Therefore any byte in the UTF-8 string can be treated the same regardless of whether it's a lead or continuation character: i.e. we percent encode it if it's not an unreserved character.
Here's the function:
// Assumes Defined(UNICODE) function URIEncode(const S: UTF8String): string; overload; var Ch: AnsiChar; begin // Just scan the string an octet at a time looking for chars to encode Result := ''; for Ch in S do if CharInSet(Ch, cURLUnreservedChars) then Result := Result + WideChar(Ch) else Result := Result + '%' + IntToHex(Ord(Ch), 2); end;
This, and more similar routines, are available (and may even be evolving) in my Delphi Doodlings repo. View the code (see UURIEncode.pas
).
Comments
Post a Comment
Comments are very welcome, but please be aware that I moderate all comments, so there will be a delay before your comment appears.
Advertising spam and the rare abusive, hateful or racist comments will be blocked and reported.
Finally, should you have a query about, or a bug report for, one of my programs or libraries please use the relevant issue tracker rather than posting a comment to report it.
Thanks