Bug 6294 - Non standard URL parsing for file: method (extra / required)
: Non standard URL parsing for file: method (extra / required)
Status: ASSIGNED
: CoG jglobus
utils
: unspecified
: All Linux
: P3 normal
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-08-07 14:46 by
Modified: 2008-08-28 13:01 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-08-07 14:46:16
The problem has been encountered in srmcp that uses CoG libraries to parse
URLs.
File URLs require an extra '/'.
'file:///abs_path/file_name' is failing to refer to '/abs_path/file_name' in a
POSIX file system, 'file:////abs_path/file_name' is referring to it

This is different from the behavior of C clients (like globus-url-copy) and
other user clients (grid related or not)

According to RFC 1738 '/' is a special character into the URLs:
In a URL like file://host/path1/path2/file the first 2 are separating the
method, the 3rd is separating the host, the remaining are part of the file URL
specification and separate hierarchy level (the path is not a POSIX path even
if it looks similar).
------- Comment #1 From 2008-08-07 15:31:35 -------
(In reply to comment #0)
> 
> File URLs require an extra '/'.
> 'file:///abs_path/file_name' is failing to refer to '/abs_path/file_name' in a
> POSIX file system, 'file:////abs_path/file_name' is referring to it

I thought that both file:///abs_path/file_name and file:/abs_path/file_name
were valid URIs, but that file://abs_path/file_name and
file:////abs_path/file_name were not.
------- Comment #2 From 2008-08-28 12:32:57 -------
It is not clear to me what you want.  Current jglobus has:

file:///path/file

refer to: path/file, and:

file:////path/file

refer to /path/file.  

From the the RFC that you posted this seems correct.  3 slashes are needed, 2
for the scheme seperator :// and then 1 for the empty host separator, from
there you start the path.  Sure it is quite a bit of slashes and i imagine that
it why there are so many other convention for file urls, but i think it is
complaint.  Is there some optional or additional behavior that you are looking
for?
------- Comment #3 From 2008-08-28 12:44:17 -------
I think Marco's point is that the RFC mentions that "/" is a separator if
present before the path and that it should not be considered meaningful beyond
that, and that all paths, whether prefixed with a slash or not should be
absolute. Also the FTP URL RFC mentions that an absolute path should be
prefixed with %2f (presumably if the default ftp dir is different from the root
of the FS).

E.g. 
file:///etc/passwd should point to /etc/passwd.
ftp://host/etc/passwd should point to FTP_ROOT/etc/passwd
ftp://host/%2fetc/passwd should point to /etc/passwd

This is somewhat tricky because it makes it more difficult to express "relative
paths" in a uniform way, something that seems not to have been considered much
in the RFCs.
------- Comment #4 From 2008-08-28 13:01:51 -------
It is difficult to make a change to this parsing code after cog has existed for
so long.  Does anyone know the conventions or assumptions most users make?  We
could overload the parsing functions such that a new method includes a flag
specifying file url parsing behavior, would that work for your situation?