Syllabus
Goals and Applications of Fault Tolerant Computing – Reliability, Availability, Safety, Dependability, Long Life, Critical Computation, High Availability Applications, Fault Tolerance as a Design Objective. Fault Models – Faults, Errors, and Failures, Causes and Characteristics of Faults, Logical and Physical Faults, Error Models. Fault Tolerant Design Techniques: Hardware redundancy, Software Redundancy, Time redundancy and Information redundancy. Check pointing, Fault tolerant networks, Reconfiguration-based fault tolerance. Reliability Evaluation Techniques – Failure Rate, Mean Time to Repair, Mean Time Between Failure, Reliability Modelling, Fault Coverage, M-of-N Systems, Markov Models, Safety, Maintainability, Availability. Case studies of fault tolerant systems and current research issues – Space Shuttle, Tandem 16 Non-Stop System, Recovery oriented computing, Fault tolerant platforms for Automotive Safety-Critical, Reliability and Fault tolerance in Collective Robot Systems.