Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site utcsrgv.UUCP
Path: utzoo!utcsrgv!voula
From: voula@utcsrgv.UUCP (Voula Vanneli)
Newsgroups: ont.events
Subject: Systems Seminar
Message-ID: <758@utcsrgv.UUCP>
Date: Mon, 11-Feb-85 11:56:20 EST
Article-I.D.: utcsrgv.758
Posted: Mon Feb 11 11:56:20 1985
Date-Received: Mon, 11-Feb-85 14:52:44 EST
Distribution: ont
Organization: CSRI, University of Toronto
Lines: 66








                   UNIVERSITY OF TORONTO
               DEPARTMENT OF COMPUTER SCIENCE
  (SF = Sandford Fleming Building, 10 King's College Road)
(MC = McLennan Physical Laboratories, 60 St. George Street)
        (RS = Rosebrugh Building, Taddlecreek Road)

SYSTEMS SEMINAR - Tuesday, February 19, 10 a.m., SF 1105

                        Dr. Jim Gray
              Tandem Computers, Cupertino, CA

         "Approaches to Fault Tolerant Computing" *

                          ABSTRACT



     First an overview of the relative importance of  human,
software  and  hardware  faults  is presented.  I argue that
reliable hardware is a reality and that most system failures
are due to software and human errors.

     Simple  user  interfaces  are  the  solution  to  human
errors.

     Several approaches to tolerating  software  errors  are
presented;  transactions,  lock-step  process  pairs, shadow
process pairs, and persistent process pairs.  It  is  argued
that  software  errors  are soft and hence transactions plus
persistent process pairs are the best solution.

     Discussion then turns from fault tolerant execution  to
fault  tolerant  storage.  It is shown why data mirroring is
more relevant than exact replicas and why  majority  replica
schemes are not appropriate in a local or long-haul network.
I argue  that  schemes  for  long-haul  replication  require
inconsistency  and  then  present  the  Snap  Shot  and ASAP
replica approaches with examples.