The Catch 22 with URIs

Henrik Holst henrik.holst at millistream.com
Tue Apr 3 17:38:53 EDT 2012


Dear list,

 As I see it there is a serious catch 22 with URI handling in neon.
Neon provides us with the wonderful functions to properly escape paths
(ne_path_escape) and to properly parse URIs (ne_uri_parse).

However it is here that the catch comes to play. The escape function
only handles paths and not full URIs and the parse function requires a
properly escaped URI.

More than one application that uses neon accepts URIs from the user
either via some GUI, a config or the command line. In either form it
should not be uncommon for the application to receive a path in the
form of "http://test.com/path with spaces".

ne_uri_parse() refuses to parse it since the path component contains
spaces and if the naive application writer would put the URI trough
ne_path_escape() he would now sit with an URI of
"http%3a//test.com/path%20with%20spaces", a path that ne_uri_parse()
still refuses to parse (and with good reason).

As I can see it there is two options out of this:

1) Relax ne_uri_parse() so that it doesn't break on spaces etc.
2) Implement a ne_uri_escape() or ne_path_lenient_escape() thas is
more liberal in what it escapes.

The problem with #1 is that ne_uri_parse() tries to decode query and
fragment and that can be quite tricky to do if the path is not
properly escaped.

I have investigated how for example Wget or cURL does it and they tend
to go with #2 where they use a liberal escape funtion for URIs given
by the command line in which spaces are transformed to %20 but alreade
hex-encodings like %20 isn't touched, for Wget the following test
cases are described:

   "http://abc.xyz/%20%3F%%36%31%25aa% a?a=%61+a%2Ba&b=b%26c%3Dc" ->
"http://abc.xyz/%20%3F%25%36%31%25aa%25%20a?a=%61+a%2Ba&b=b%26c%3Dc"
   "foo bar"         -> "foo%20bar"
   "foo%20bar"       -> "foo%20bar"
   "foo %20bar"      -> "foo%20%20bar"
   "foo%%20bar"      -> "foo%25%20bar"       (0x25 == '%')
   "foo%25%20bar"    -> "foo%25%20bar"
   "foo%2%20bar"     -> "foo%252%20bar"
   "foo+bar"         -> "foo+bar"            (plus is reserved!)
   "foo%2b+bar"      -> "foo%2b+bar"

I have no problem providing code for either solution but would
appreciate if any one on this list have any ideas on what the correct
way forward would be, perhaps there is a good #3 that I've missed :-)

/Henrik Holst



More information about the neon mailing list