Publication Type:

Conference Paper

Source:

Proceedings of 2010 International Conference on Communication and Computational Intelligence, INCOCCI-2010, Perundurai, Erode, p.467-471 (2010)

ISBN:

9788183713696

URL:

http://www.scopus.com/inward/record.url?eid=2-s2.0-79954594711&partnerID=40&md5=c6bf54e2d77d2241df70ccc6e8025d98

Keywords:

Algorithms, Artificial intelligence, cluster, Cluster computing, Computational grids, Divide-and-conquer algorithm, Efficient algorithm, Fault, Fault detection, Fault tolerance, Fault-tolerant, Grid computing, load balancing, Local state, Node failure, Parallel architectures, Quality assurance, recovery, Recovery strategies, watch dog timer, Watchdog timers, Work-flows

Abstract:

Grid computing basically means applying the resources of individual computers in a network to focus on a single problem/task at the same time. But the disadvantage of this feature is that the computers which are actually performing the calculations might not be always trustworthy and may fail periodically. Hence larger the number of nodes in the grid, greater is the probability that a node fails. Hence in order to execute the workflows in a fault tolerant manner we go for fault tolerance and recovery strategies. This paper proposes a method in which the instantaneous snapshot of the local state of processes within each node is recorded. An efficient algorithm is introduced for the detection of the node failures using watch dog timers. For recovery we make use of divide and conquer algorithm that avoids redoing of already completed jobs, enabling faster recovery. © 2010 Kongu Engineering College.

Notes:

cited By (since 1996)0; Conference of org.apache.xalan.xsltc.dom.DOMAdapter@644e80aa ; Conference Date: org.apache.xalan.xsltc.dom.DOMAdapter@5de14c58 Through org.apache.xalan.xsltc.dom.DOMAdapter@725b558e; Conference Code:84556

Cite this Research Publication

A. H. Bhagyashree, Pradeep, D., Jayanthy, N., Mounica, K. V., Nivejaa, S., and P. Dharani, S., “A hierarchical fault detection and recovery in a computational grid using watchdog timers”, in Proceedings of 2010 International Conference on Communication and Computational Intelligence, INCOCCI-2010, Perundurai, Erode, 2010, pp. 467-471.