| Summary: | Remove Postgresql specific queries from RFT | ||
|---|---|---|---|
| Product: | RFT | Reporter: | Ravi Madduri <madduri@mcs.anl.gov> |
| Component: | RFT | Assignee: | Ravi Madduri <madduri@mcs.anl.gov> |
| Status: | RESOLVED FIXED | ||
| Severity: | enhancement | CC: | ghulamhu@us.ibm.com, millerjj@us.ibm.com, nitswamy@in.ibm.com, ogsa-bugs@globus.org, paxhia@us.ibm.com, seelbach@us.ibm.com, tboehm@de.ibm.com |
| Priority: | P3 | ||
| Version: | development | ||
| Target Milestone: | 4.2 | ||
| Hardware: | Macintosh | ||
| OS: | All | ||
| Attachments: | removing select last_value from request_seq query | ||
I agree GT4 should be database neutral. I don't think you need to find
another way to generate IDs, or use other identifiers like UUID as an index
into the requests.
Postgres, MySql, Cloudscape/Derby and other databases have ways of generating
sequential IDs. This is usually defined when you create the tables.
In GT4, in the ReliableFileTransferDbAdapter.java code you reference a
Postgres extension (ie. SEQUENCE) with the following statement:
ResultSet rs = statement.executeQuery("SELECT last_value FROM "
+ "request_seq");
In GT3, in the TransferDbAdapter.java code you use a more database generic
statement:
ResultSet rs = statement.executeQuery("SELECT COUNT(id) FROM " + "request");
Why not use this more generic mechanism?
Because when rft resources reach end of their life time they are removed from the database and select count(id) from request; may result in a id that is already been used. it worked in gt3.2 as the records in the database were never cleaned by the service
Via the email gateway:
Ravi I have a few doubts about your response and would like further
clarification regarding this. Lets consider the following steps:
(1) state of a transfer request is active now.
(2) the database is accessed and a row is created
(3) Lets assume the autoincrement column is used to generate ids so an id
is chosen
(4) the service has the id of the request and it can use that id to refer
to the respective row in the database
(5) The service uses the id by doing select count(id), lets say in the
following situations:
-> when the marker is changed
-> when the container goes down and comes back up again
-> when the lifetime of the request expires
(6) Finally, in an ideal case the transfer request reaches the "Finished"
state and a transfer is successfully completed.
(7) The service starts the cleanup process..
Now I have no idea how does this cleanup process works,, like what
are the details. How does this cleanup process makes the use of sequence
in Postgres safer, agreeable and feasible and why and how does it make the
count(id) problematic?
Probably I am not understanding exactly what are the pros and cons of one
technique over another? We agree that the service persists its state data,
hence the ids its keeping track of, is aware of, is using to access the
db. What I dont understand is how the service persists the id data, if
sequence is used and why it wont persist the id data if count(id) is used
and what are the impacts of the cleanup process on either of these
techniques?
I really appreciate some further clarification regarding this issue.
Thanks a lot,
Shama Ghulamhussain
IBM Advanced Systems Infrastructure Development
T/L: 295 8284
ghulamhu@us.ibm.com
Created an attachment (id=539) [details]
removing select last_value from request_seq query
Ok this is what you got to do.
1.remove request_seq from rft database schema.
2. Apply this patch if possible or do something that I do here. You
basically should insert the requestId instead of database inserting it
with next value of sequence.
3. See if you can get away doing the same trick with transfer table. And
let me know how it works out.
Any updates on this one ?
I just comitted changes to trunk and globus_4_0_branch that would work around from using sequences in postgres. I tested my changes with mysql.