Course Syllabus
Hardware fault tolerance, software fault tolerance, information redundancy, check pointing, fault tolerant networks, reconfiguration-based fault tolerance, and simulation techniques. Dependability concepts: Dependable system, techniques for achieving dependability, dependability measure, fault, error, failure, and classification of faults and failures.
Fault Tolerance Strategies: Fault detection, masking, containment, location, reconfiguration, and recovery. Fault Tolerant Design Techniques: Hardware redundancy, software redundancy, time redundancy and information redundancy. Dependable communication: Dependable channels, survivable networks, fault-tolerant routing. Fault recovery, Stable storage and RAID architectures, and Data replication and resiliency. Case studies of fault tolerant multiprocessor and distributed systems.