URL Encoding
I've being reviewing the URI encoding code from the Code Snippets Database and I realised that it doesn't comply with RFC 3986.
So here's my first attempt at some compliant code.
According to the RFC:
"the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent-encoded."
So, we define the URIEncode function to operate on the UTF8String type. It's easy to encode UnicodeString and AnsiString into UTF using the System unit's UTF8Encode overloaded functions. You can overload URIEncode to do the UTF8 conversion, but I haven't done here.
There's a nice shortcut we can take when url encoding. Remember only unreserved characters are percent-encoded. The set of unreserved characters is:
const cURLUnreservedChars = [ 'A'..'Z', 'a'..'z', '0'..'9', '-', '_', '.', '~' ];
All other characters are percent encoded. But what about any UTF-8 continuation bytes? Well, by definition these have value > $80. And all the unreserved characters have ordinal value < $80. This means that no legal continuation character can be an unreserved character.
Therefore any byte in the UTF-8 string can be treated the same regardless of whether it's a lead or continuation character: i.e. we percent encode it if it's not an unreserved character.
Here's the function:
// Assumes Defined(UNICODE) function URIEncode(const S: UTF8String): string; overload; var Ch: AnsiChar; begin // Just scan the string an octet at a time looking for chars to encode Result := ''; for Ch in S do if CharInSet(Ch, cURLUnreservedChars) then Result := Result + WideChar(Ch) else Result := Result + '%' + IntToHex(Ord(Ch), 2); end;
This, and more similar routines, are available (and may even be evolving) in my Delphi Doodlings repo. View the code (see UURIEncode.pas
).
Comments
Post a Comment
Comments are very welcome, but please don't comment here if:
1) You have a query about, or a bug report for, one of my programs or libraries. Most of my posts contain a link to the relevant repository where there will be an issue tracker you can use.
2) You have a query about any 3rd party programs I feature, please address them to the developer(s) - there will be a link in the post.
3) You're one of the tiny, tiny minority who are aggressive or abusive - in the bin you go and reported you will be!
Thanks