Bug 5415 - WorkspacePersistenceDB not updated after workspace --shutdown
: WorkspacePersistenceDB not updated after workspace --shutdown
Status: RESOLVED FIXED
: Nimbus
Workspace service
: TP1.2.3
: PC Linux
: P3 normal
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2007-06-28 17:25 by
Modified: 2007-10-10 14:08 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-06-28 17:25:30
When terminating a workspace using the workspace --shutdown parameter it does
not appear that the WorkspacePersistenceDB is being updated with current
available resources.  The following error message is received by the client
after trying to deploy the same image (same specifications) that was just
terminated using the --shutdown parameter.   

------
Error:
------


Scheduling problem:
Error creating workspace resource:  [Caused by: No resource pool has an
applicable entry]

As well the /opt/workspace/secureimages directory on the worker node is left
with the propogated image and the client's terminal doesn't receive the
termination notice, leaving it hanging.

Similar problems are encountered when the Shutdown time is left to expire while
a workspace is running.
------- Comment #1 From 2007-06-28 21:30:39 -------
The slot will be occupied until WSRF resource termination passes.  Shutdown
moves the VM back to the Propagated state, you could for example call --start
on it at this point.  

Or did you also call --destroy and see this behavior?  That would be a
regression.
------- Comment #2 From 2007-06-29 19:00:04 -------
Subject: Re:  WorkspacePersistenceDB not updated after workspace
 --shutdown

Thank you for clarifying that the slot is maintained until WSRF resource 
termination passes.  However, further inspection suggests that the 
WorkspacePersistenceDB is not being updated even long after the resource 
termination time has expired.  Attempts have been made using both the 
--shutdown parameter and allowing shutdown time to expire.  In both 
cases after the WSRF resource termination passes it doesn't appear that 
the resources are being freed up.  This is true for local and propagated 
images.

As well to confirm, when --destroy, is used to terminate a workspace the 
WorkspacePersistenceDB seems to be properly updated.

I have also tried to --start a workspace I just --shutdown before the 
resource termination time expired and the terminal comes back with

  Using endpoint:
Address: https://142.104.60.49:8443/wsrf/services/WorkspaceService
Reference property[0]:
<ns2:WorkspaceKey 
xmlns:ns2="http://www.globus.org/2006/08/workspace">1</ns2:WorkspaceKey>

Started.

However, querying the worker node with xm list shows that the workspace 
has not been restarted.




bugzilla-daemon@mcs.anl.gov wrote:
 > http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5415
 >
 >
 >

bugzilla-daemon@mcs.anl.gov wrote:
> http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5415
> 
> 
> 
> 
> 
> ------- Comment #1 from tfreeman@mcs.anl.gov  2007-06-28 21:30 -------
> The slot will be occupied until WSRF resource termination passes.  Shutdown
> moves the VM back to the Propagated state, you could for example call --start
> on it at this point.  
> 
> Or did you also call --destroy and see this behavior?  That would be a
> regression.
> 
> 
> 
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
------- Comment #3 From 2007-07-02 22:55:13 -------
Thanks for looking into this, will confirm and fix shortly.
------- Comment #4 From 2007-07-02 23:39:55 -------
I can't recreate this here with TP1.2.3, though I am not using a real Xen
backend which I will do tomorrow.  

Is it possible to get logs from you?  If so, you can send them to me privately
if there is information you would not like to be archived like IP addresses
etc.  
------- Comment #5 From 2007-07-03 00:31:35 -------
Is your default shutdown method set to trash?  Bug 5345 could account for some
of these issues if so.
------- Comment #6 From 2007-07-03 12:11:43 -------
(In reply to comment #5)
> Is your default shutdown method set to trash?  Bug 5345 could account for some
> of these issues if so.
> 

Where do I set the default shutdown method and what are my choices?  Is this
covered anywhere in documentation?
------- Comment #7 From 2007-07-03 14:44:35 -------
Oh, this interface page SHOULD have this but it doesn't, it was introduced in
TP1.2.  Will update the documentation for next release.

http://workspace.globus.org/vm/TP1.2.3/interfaces/deployment.html

It's an optional part of the deployment request:

            <xs:element ref="DeploymentTime"/>
            <xs:element ref="WorkspaceState"/>
            <xs:element ref="ResourceAllocation"/>
            <xs:element ref="ShutdownMechanism" minOccurs="0"/>

Only supported value is "Trash", otherwise leave it out for normal shutdown. 

This feature was added to support the situation where many VMs are started off
of the same template image but it is unecessary to unpropagate them and save
the "result".  The prime example is an image that is used as a node in a pool
of worker nodes driven by a batch system, mirroring the situation on most grid
sites but dynamically adding and removing worker nodes.
------- Comment #8 From 2007-10-10 14:08:05 -------
I don't see this issue in TP1.3.0 (available shortly)

There was a bug in the TP1.2.3 code I know of that is related to this.  

In ResourcePoolUtil#retireMem(), there was this line:

    replaceResourcepoolEntry(hostname, entry);

Which should look like so

    replaceResourcepoolEntry(poolname, entry);

This update is in workspace_tp_1_3_0_branch 

Regarding Comment #7, that information will be included in the TP1.3 docs.