Bugzilla – Bug 6294
Non standard URL parsing for file: method (extra / required)
Last modified: 2008-08-28 13:01:51
You need to log in before you can comment on or make changes to this bug.
The problem has been encountered in srmcp that uses CoG libraries to parse URLs. File URLs require an extra '/'. 'file:///abs_path/file_name' is failing to refer to '/abs_path/file_name' in a POSIX file system, 'file:////abs_path/file_name' is referring to it This is different from the behavior of C clients (like globus-url-copy) and other user clients (grid related or not) According to RFC 1738 '/' is a special character into the URLs: In a URL like file://host/path1/path2/file the first 2 are separating the method, the 3rd is separating the host, the remaining are part of the file URL specification and separate hierarchy level (the path is not a POSIX path even if it looks similar).
(In reply to comment #0) > > File URLs require an extra '/'. > 'file:///abs_path/file_name' is failing to refer to '/abs_path/file_name' in a > POSIX file system, 'file:////abs_path/file_name' is referring to it I thought that both file:///abs_path/file_name and file:/abs_path/file_name were valid URIs, but that file://abs_path/file_name and file:////abs_path/file_name were not.
It is not clear to me what you want. Current jglobus has: file:///path/file refer to: path/file, and: file:////path/file refer to /path/file. From the the RFC that you posted this seems correct. 3 slashes are needed, 2 for the scheme seperator :// and then 1 for the empty host separator, from there you start the path. Sure it is quite a bit of slashes and i imagine that it why there are so many other convention for file urls, but i think it is complaint. Is there some optional or additional behavior that you are looking for?
I think Marco's point is that the RFC mentions that "/" is a separator if present before the path and that it should not be considered meaningful beyond that, and that all paths, whether prefixed with a slash or not should be absolute. Also the FTP URL RFC mentions that an absolute path should be prefixed with %2f (presumably if the default ftp dir is different from the root of the FS). E.g. file:///etc/passwd should point to /etc/passwd. ftp://host/etc/passwd should point to FTP_ROOT/etc/passwd ftp://host/%2fetc/passwd should point to /etc/passwd This is somewhat tricky because it makes it more difficult to express "relative paths" in a uniform way, something that seems not to have been considered much in the RFCs.
It is difficult to make a change to this parsing code after cog has existed for so long. Does anyone know the conventions or assumptions most users make? We could overload the parsing functions such that a new method includes a flag specifying file url parsing behavior, would that work for your situation?