<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugzilla.globus.org/bugzilla/bugzilla.dtd">

<bugzilla version="3.2.3"
          urlbase="http://bugzilla.globus.org/bugzilla/"
          maintainer="bacon@mcs.anl.gov"
>

    <bug>
          <bug_id>6294</bug_id>
          
          <creation_ts>2008-08-07 14:46</creation_ts>
          <short_desc>Non standard URL parsing for file: method (extra / required)</short_desc>
          <delta_ts>2008-08-28 13:01:51</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>CoG jglobus</product>
          <component>utils</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>ASSIGNED</bug_status>
          
          
          
          
          
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Marco Mambelli">marco@hep.uchicago.edu</reporter>
          <assigned_to name="John Bresnahan">bresnaha@mcs.anl.gov</assigned_to>
          <cc>jglobus-dev@globus.org</cc>

      

      
          <long_desc isprivate="0">
            <who name="Marco Mambelli">marco@hep.uchicago.edu</who>
            <bug_when>2008-08-07 14:46:16</bug_when>
            <thetext>The problem has been encountered in srmcp that uses CoG libraries to parse URLs.
File URLs require an extra &apos;/&apos;.
&apos;file:///abs_path/file_name&apos; is failing to refer to &apos;/abs_path/file_name&apos; in a POSIX file system, &apos;file:////abs_path/file_name&apos; is referring to it

This is different from the behavior of C clients (like globus-url-copy) and other user clients (grid related or not)

According to RFC 1738 &apos;/&apos; is a special character into the URLs:
In a URL like file://host/path1/path2/file the first 2 are separating the method, the 3rd is separating the host, the remaining are part of the file URL specification and separate hierarchy level (the path is not a POSIX path even if it looks similar).</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Tom Scavo">trscavo@gmail.com</who>
            <bug_when>2008-08-07 15:31:35</bug_when>
            <thetext>(In reply to comment #0)
&gt; 
&gt; File URLs require an extra &apos;/&apos;.
&gt; &apos;file:///abs_path/file_name&apos; is failing to refer to &apos;/abs_path/file_name&apos; in a
&gt; POSIX file system, &apos;file:////abs_path/file_name&apos; is referring to it

I thought that both file:///abs_path/file_name and file:/abs_path/file_name were valid URIs, but that file://abs_path/file_name and file:////abs_path/file_name were not.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="John Bresnahan">bresnaha@mcs.anl.gov</who>
            <bug_when>2008-08-28 12:32:57</bug_when>
            <thetext>It is not clear to me what you want.  Current jglobus has:

file:///path/file

refer to: path/file, and:

file:////path/file

refer to /path/file.  

From the the RFC that you posted this seems correct.  3 slashes are needed, 2 for the scheme seperator :// and then 1 for the empty host separator, from there you start the path.  Sure it is quite a bit of slashes and i imagine that it why there are so many other convention for file urls, but i think it is complaint.  Is there some optional or additional behavior that you are looking for?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="Mihael Hategan">hategan@mcs.anl.gov</who>
            <bug_when>2008-08-28 12:44:17</bug_when>
            <thetext>I think Marco&apos;s point is that the RFC mentions that &quot;/&quot; is a separator if present before the path and that it should not be considered meaningful beyond that, and that all paths, whether prefixed with a slash or not should be absolute. Also the FTP URL RFC mentions that an absolute path should be prefixed with %2f (presumably if the default ftp dir is different from the root of the FS).

E.g. 
file:///etc/passwd should point to /etc/passwd.
ftp://host/etc/passwd should point to FTP_ROOT/etc/passwd
ftp://host/%2fetc/passwd should point to /etc/passwd

This is somewhat tricky because it makes it more difficult to express &quot;relative paths&quot; in a uniform way, something that seems not to have been considered much in the RFCs.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who name="John Bresnahan">bresnaha@mcs.anl.gov</who>
            <bug_when>2008-08-28 13:01:51</bug_when>
            <thetext>It is difficult to make a change to this parsing code after cog has existed for so long.  Does anyone know the conventions or assumptions most users make?  We could overload the parsing functions such that a new method includes a flag specifying file url parsing behavior, would that work for your situation?
</thetext>
          </long_desc>
      
      

    </bug>

</bugzilla>