Bugzilla – Bug 5322
GRAM/Derby db.lck file not cleaned up after WS container stopped
Last modified: 2008-01-16 15:19:34
You need to log in before you can comment on or make changes to this bug.
If the WS container has previously been run as root and then restarted as a different user, job submission fails with the following error: AxisFault faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException faultSubcode: faultString: java.rmi.RemoteException: Job creation failed.; nested exception is: org.globus.wsrf.ResourceException: ; nested exception is: SQL Exception: An SQL data change is not permitted for a read-only connection, user or database. The problem is caused by the db.lck file in $GLOBUS_HOME/var/gram/ResourceDatabase persisting and being owned by root. Deleting or chowning the file to the user running the container fixes the problem, after a container restart. This problem occurred when submitting jobs with both globusrun-ws and the JSDL development GlobusRun client. The container was started and stopped using the globus-start-container-detached and globus-stop-container-detached commands. The problem could be resolved by deleting the db.lck file when the container is shut down. Note: the ResourceDatabase directory also contains a subdirectory tmp which has its ownership switched to root after root has run the container. The presence of this directory doesn't seem to cause a problem, but it possibly should also be removed when the container is stopped.
An approach here may be to pass a cleanup message to services when the container's signal handler catches SIGINT?
A solution discussed has been to provide a single Derby installation and use JNDI in core to provide access to it for services.
I have gone ahead and added a new interface called ContainerLifecycleListener which executes actions at major events in the container lifecycle. I've made one called DerbyContainerLifecycleListener which cleanly starts Derby before the container starts and cleanly shuts it down after the container finishes. This should eliminate lock artifacts from existing after the container has shut down. However, if the container goes down in an unnatural way (e.g. pulling the plug on the machine) there is no guarantee that the filesystem resources will be cleaned up. in this case, you should remove the db.lck file by hand. Note, a Ctrl-C is considered a natural way of shutting down, so the resources should be cleaned in that situation. This replaces the need for a single shared db. Different service can reference their own db, and all derby db's should be cleaned up.