Report ID
1994-06
Report Authors
H. Leong and D. Agrawal
Report Date
Abstract
Recovery from failures is important in distrinuted computing. A commontechnique to support recovery is asynchronous checkpointing, coupled withoptimistic message logging. These schemes have low overheads duringfailure-free operations and can provide an acceptable degree offault-tolerance. Central to these protocols is the determination of a maximalconsistent global state, which is recoverable. Message semantics is notexploited in most existing recovery protocols to determine the recoverablestate. We propose to identify messages that are not influential in thecomputation through message semantics. These messages can be logically removedfrom the computation without changing its meaning or result. In this paper, weillustrate with examples how the removal of these messages improves thetheoretical maximal consistent global state. Taking semantics into account,recovery protocols are then developed to realize the idea. The semantics inobject-oriented databases is adapted to special processes acting as servers forfurther improvements. This technique can also be applied to ensure a moretimely committment for output in a distributed computation.
Document
1994-06.ps267.4 KB