Grid computing basically means applying the resources of individual computers in a network to focus on a single problem/task at the same time. But the disadvantage of this feature is that the computers which are actually performing the calculations might not be always trustworthy and may fail periodically. Hence larger the number of nodes in the grid, greater is the probability that a node fails. Hence in order to execute the workflows in a fault tolerant manner we go for fault tolerance and recovery strategies. This paper proposes a method in which the instantaneous snapshot of the local state of processes within each node is recorded. An efficient algorithm is introduced for the detection of the node failures using watch dog timers. For recovery we make use of divide and conquer algorithm that avoids redoing of already completed jobs, enabling faster recovery. © 2010 Kongu Engineering College.
cited By (since 1996)0; Conference of org.apache.xalan.xsltc.dom.DOMAdapter@644e80aa ; Conference Date: org.apache.xalan.xsltc.dom.DOMAdapter@5de14c58 Through org.apache.xalan.xsltc.dom.DOMAdapter@725b558e; Conference Code:84556
A. H. Bhagyashree, Pradeep, D., Jayanthy, N., Mounica, K. V., Nivejaa, S., and P. Dharani, S., “A hierarchical fault detection and recovery in a computational grid using watchdog timers”, in Proceedings of 2010 International Conference on Communication and Computational Intelligence, INCOCCI-2010, Perundurai, Erode, 2010, pp. 467-471.