Bugzilla – Bug 5415
WorkspacePersistenceDB not updated after workspace --shutdown
Last modified: 2007-10-10 14:08:05
You need to log in before you can comment on or make changes to this bug.
When terminating a workspace using the workspace --shutdown parameter it does not appear that the WorkspacePersistenceDB is being updated with current available resources. The following error message is received by the client after trying to deploy the same image (same specifications) that was just terminated using the --shutdown parameter. ------ Error: ------ Scheduling problem: Error creating workspace resource: [Caused by: No resource pool has an applicable entry] As well the /opt/workspace/secureimages directory on the worker node is left with the propogated image and the client's terminal doesn't receive the termination notice, leaving it hanging. Similar problems are encountered when the Shutdown time is left to expire while a workspace is running.
The slot will be occupied until WSRF resource termination passes. Shutdown moves the VM back to the Propagated state, you could for example call --start on it at this point. Or did you also call --destroy and see this behavior? That would be a regression.
Subject: Re: WorkspacePersistenceDB not updated after workspace --shutdown Thank you for clarifying that the slot is maintained until WSRF resource termination passes. However, further inspection suggests that the WorkspacePersistenceDB is not being updated even long after the resource termination time has expired. Attempts have been made using both the --shutdown parameter and allowing shutdown time to expire. In both cases after the WSRF resource termination passes it doesn't appear that the resources are being freed up. This is true for local and propagated images. As well to confirm, when --destroy, is used to terminate a workspace the WorkspacePersistenceDB seems to be properly updated. I have also tried to --start a workspace I just --shutdown before the resource termination time expired and the terminal comes back with Using endpoint: Address: https://142.104.60.49:8443/wsrf/services/WorkspaceService Reference property[0]: <ns2:WorkspaceKey xmlns:ns2="http://www.globus.org/2006/08/workspace">1</ns2:WorkspaceKey> Started. However, querying the worker node with xm list shows that the workspace has not been restarted. bugzilla-daemon@mcs.anl.gov wrote: > http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5415 > > > bugzilla-daemon@mcs.anl.gov wrote: > http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5415 > > > > > > ------- Comment #1 from tfreeman@mcs.anl.gov 2007-06-28 21:30 ------- > The slot will be occupied until WSRF resource termination passes. Shutdown > moves the VM back to the Propagated state, you could for example call --start > on it at this point. > > Or did you also call --destroy and see this behavior? That would be a > regression. > > > > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
Thanks for looking into this, will confirm and fix shortly.
I can't recreate this here with TP1.2.3, though I am not using a real Xen backend which I will do tomorrow. Is it possible to get logs from you? If so, you can send them to me privately if there is information you would not like to be archived like IP addresses etc.
Is your default shutdown method set to trash? Bug 5345 could account for some of these issues if so.
(In reply to comment #5) > Is your default shutdown method set to trash? Bug 5345 could account for some > of these issues if so. > Where do I set the default shutdown method and what are my choices? Is this covered anywhere in documentation?
Oh, this interface page SHOULD have this but it doesn't, it was introduced in TP1.2. Will update the documentation for next release. http://workspace.globus.org/vm/TP1.2.3/interfaces/deployment.html It's an optional part of the deployment request: <xs:element ref="DeploymentTime"/> <xs:element ref="WorkspaceState"/> <xs:element ref="ResourceAllocation"/> <xs:element ref="ShutdownMechanism" minOccurs="0"/> Only supported value is "Trash", otherwise leave it out for normal shutdown. This feature was added to support the situation where many VMs are started off of the same template image but it is unecessary to unpropagate them and save the "result". The prime example is an image that is used as a node in a pool of worker nodes driven by a batch system, mirroring the situation on most grid sites but dynamically adding and removing worker nodes.
I don't see this issue in TP1.3.0 (available shortly) There was a bug in the TP1.2.3 code I know of that is related to this. In ResourcePoolUtil#retireMem(), there was this line: replaceResourcepoolEntry(hostname, entry); Which should look like so replaceResourcepoolEntry(poolname, entry); This update is in workspace_tp_1_3_0_branch Regarding Comment #7, that information will be included in the TP1.3 docs.