Bugzilla – Bug 5607
Assertion in libglobus_gridftp_server
Last modified: 2008-01-17 00:23:50
You need to
before you can comment on or make changes to this bug.
David Smith from the LCG reported a potential problem in GridFTP to the VDT.
Can you help us evaluate if this is a bug, and if it is, how it can be fixed?
The problem is in Globus 4.0.5. Here is his description:
> We noticed that with our DPM DSI (gridftp2) plugin, if a transfer
> which is being received (i.e. written to local storage by the DSI
> plugin) in extended mode has to be aborted early by calling
> globus_gridftp_server_finished_transfer() with a non success result,
> then when using the debug version of the globus libraries we have an
> assertion from:
> globus_l_gfs_data_trev_kickout() line aprox 4222, source file
> globus_i_gfs_data.c (from
> vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3). This correspends to
> the assertion in swtich() block at:
> case GLOBUS_L_GFS_DATA_FINISH:
> pass = GLOBUS_FALSE;
> case GLOBUS_L_GFS_DATA_COMPLETING:
> case GLOBUS_L_GFS_DATA_COMPLETE:
> case GLOBUS_L_GFS_DATA_REQUESTING:
> globus_assert(0 && "possibly memory corruption");
> the value of the switch variable here was
> GLOBUS_L_GFS_DATA_FINISH_WITH_ERROR, and is matched by the default:
> clause. Please could it be checked if this was the desired behavior
> (my guess was that it was intended to be handled as for
> GLOBUS_L_GFS_DATA_FINISH at this point)
If he is right about the intended behavior, would you be able to supply us with
an advisory or patch so we can provide a new version of GridFTP to LCG?
I can't seem to reproduce this. Is that assertion always happen for you on a
finished_transfer() with error? Is that build a threaded flavor?
The suggested fix sounds reasonable, but I'd like to trigger this myself before
I sign off on it.
After looking at this further, I can see the problem. It is a timing issue wrt
to returning an error at the same time as a transfer marker (restart or
performance marker) is supposed to be generated. I wasn't able to reproduce
this in the standard file DSI, but I was able to trigger the race directly in
the debugger. Your guess as to the fix was correct.
The fix has been committed to the 4.0 branch and trunk. An update package can
be found here: