First page Back Continue Last page Text

Notes:


The two-phase commit starts by asking the primary master node to generate a TID, unless one is provided by client application, in which case there is nothing to do at network level.

Then object data can be sent to storage nodes.

Sending an object has the effect of taking a write lock on that object, which is local to each storage node. If that lock is already taken when the node receives the request, the locking TID is checked: if it's later than the one trying to get the lock, a conflict is notified to the client. As object is already being modified by a later transaction, no conflict resolution can happen, so client will always raise a conflict error in such case. Otherwise, it will delay the store operation until the lock is released.

This locking scheme can lead to deadlocks if multiple client nodes send the same object to multiple storage nodes. These deadlocks are recovered by a time-out mechanism: if a client doesn't get the response for a store request after some time, it asks storage node if the store request is being delayed due to a write lock, and if so aborts the transaction.

When the lock is taken for an operation, storage node checks for conflicts on stored object. Those conflicts are detected by comparing the base revision of that object (the revision client got from database when it started modifying it) matches its current committed revision. If those revisions are equal, storage sends a acknowledgement to the client node. Otherwise, storage node doesn't take the write lock for that object, and responds to client with a conflict notification.

Vote ends this commit phase by just waiting for all pending responses from storage nodes.

If transaction must be aborted, it is enough to ask all storage to release locks held by that transaction.