Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site fortune.UUCP Path: utzoo!watmath!clyde!floyd!harpo!ihnp4!fortune!rpw3 From: rpw3@fortune.UUCP Newsgroups: net.unix-wizards Subject: Re: UNIX IPC Datagram Reliability under - (nf) Message-ID: <2392@fortune.UUCP> Date: Tue, 31-Jan-84 07:06:39 EST Article-I.D.: fortune.2392 Posted: Tue Jan 31 07:06:39 1984 Date-Received: Tue, 7-Feb-84 05:08:06 EST Sender: notes@fortune.UUCP Organization: Fortune Systems, Redwood City, CA Lines: 93 #R:allegra:-220500:fortune:11600049:000:4908 fortune!rpw3 Jan 31 02:26:00 1984 [This lengthy tutorial probably belongs in net.arch, but the discussion has been here so far.] O.k., nobody has come forth to defend "UNIX domain datagrams", so here it is... >>> Why datagrams SHOULD be "unreliable". <<< The internet datagram "style" is based on the observation that the end processes in any communication have to be ultimately responsible for "transaction integrity" so they might as well be resonsible for all of it. No amount of intermediate error checking and retransmission can GUARANTEE reliable synchronization if the ultimate producer and consumer do not do the handshake. The layers on layers of protocols don't hack it, if the critical state is outside the end process. Nodes can crash; links can crash; nodes and links can go down and up. Servers (e.g. mail) still have to do their own ultimate lost message and duplication checking. (I will not argue that point further. If you disagree, go see your local communications wizard and get him/her to explain.) (Also, a moment of silence for anyone who thinks X.25 is a "reliable" protocol.) Given that the responsibility for ultimate error correction lies in the end-point processes, the transmission and switching portion of the net can get A LOT cheaper and simpler. Instead of trying (vainly) to GUARANTEE that no data is lost (with the attendant headaches of very careful buffer management, flow-control, load shedding, load-balancing, re-routing, synchronizing, etc.), in the internet datagram style (DoD IP, Xerox NS, etc.) the transmission system makes a "good effort" to get your packet from here to there. The only thing that IS demanded is that the probability of receiving a bad (damaged) packet that is claimed to be good should be VERY small. (Since that is a one-way requirement, it's fairly easy.) So if the packet has a bit error, throw it away; if the outgoing queue won't hold the packet, throw it away (that line's overloaded anyway); if the route's not valid anymore, toss it. Somebody (the end process) will try again soon anyway. (Two notes: 1. It is considered polite BUT NOT NECESSARY to send an error packet back, if you know where "back" is; and 2. if the system is to be generally considered usable, the long-term error rate should be less than 1%, although short-term losses of 10% or more don't hurt anything.) This seemingly cavalier attitude results in ENORMOUS savings in complexity, memory, and CPU ticks for the intermediate nodes, which merely make a (good but not perfect) attempt to throw the packet out the next link. Packet switching rates of several hundred to several thousand per second are easily attainable with cheap micros. The routers don't have to have any "memory" (other than the routing tables). They are not responsible for "connections", or "re-transmissions", or "timeouts". They don't know a terminal from a file (since they don't know either!). Secondly, the CPU/memory load of handling the connections/retransmisions/etc. is spread out where there is lots of resources -- at the end points. The backbone nodes just move data, so they can move lots of it. (Think of a hundred IBM PC's using your VAX to move files back and forth. Who do you want to do the busy work, the VAX or the PC's?) Thirdly, the end process always had to do about 70-90% of the work anyway, duplicating the work the network was doing (and sometimes triplicating the work that the kernel was duplicating, on top of that); the added 30-10% is easily justfied by the savings in the net (or in the kernel, if we are talking about process-to-process on a single host -- I didn't forget). The total number of CPU ticks on an end-point processor can even go DOWN, because of the smaller number of encapsulations (layers) packets have to go through. (In the simplest case, there are only three layers: client, datagram router or internet, and physical.) Lastly, there are some applications (voice, time-of-day) where you do not want the network trying to "help" you. A voice packet that is perfect but is late because it got retransmitted might as well have been lost -- it's useless. Ditto time-of-day. (whew! is it soup yet?) So "unreliable" when talking about datagrams means "not perfect", and is a desirable attribute. Desirable, since the cost of "reliability" is very high and the goal illusary in any case. On a single processor, it makes sense sometimes to have other (reliable) inter-process primitives besides datagrams, if (1) throughput is paramount and (2) the set of cooperating processes will NEVER be distributed. But the overhead of handling the "retransmission" can be made small (and processes DO die sometimes, even on uni-processors), so the argument for "reliable" IPC is weaker than most people think. Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065