Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site utcsrgv.UUCP Path: utzoo!utcsrgv!voula From: voula@utcsrgv.UUCP (Voula Vanneli) Newsgroups: ont.events Subject: Systems Seminar Message-ID: <758@utcsrgv.UUCP> Date: Mon, 11-Feb-85 11:56:20 EST Article-I.D.: utcsrgv.758 Posted: Mon Feb 11 11:56:20 1985 Date-Received: Mon, 11-Feb-85 14:52:44 EST Distribution: ont Organization: CSRI, University of Toronto Lines: 66 UNIVERSITY OF TORONTO DEPARTMENT OF COMPUTER SCIENCE (SF = Sandford Fleming Building, 10 King's College Road) (MC = McLennan Physical Laboratories, 60 St. George Street) (RS = Rosebrugh Building, Taddlecreek Road) SYSTEMS SEMINAR - Tuesday, February 19, 10 a.m., SF 1105 Dr. Jim Gray Tandem Computers, Cupertino, CA "Approaches to Fault Tolerant Computing" * ABSTRACT First an overview of the relative importance of human, software and hardware faults is presented. I argue that reliable hardware is a reality and that most system failures are due to software and human errors. Simple user interfaces are the solution to human errors. Several approaches to tolerating software errors are presented; transactions, lock-step process pairs, shadow process pairs, and persistent process pairs. It is argued that software errors are soft and hence transactions plus persistent process pairs are the best solution. Discussion then turns from fault tolerant execution to fault tolerant storage. It is shown why data mirroring is more relevant than exact replicas and why majority replica schemes are not appropriate in a local or long-haul network. I argue that schemes for long-haul replication require inconsistency and then present the Snap Shot and ASAP replica approaches with examples.