Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site gatech.UUCP
Path: utzoo!utcs!lsuc!pesnta!amd!gatech!jeff
From: jeff@gatech.UUCP (Jeff Lee)
Newsgroups: net.lang,net.lang.pascal
Subject: Re: Pointers and hashing
Message-ID: <11928@gatech.UUCP>
Date: Thu, 7-Feb-85 14:46:31 EST
Article-I.D.: gatech.11928
Posted: Thu Feb  7 14:46:31 1985
Date-Received: Fri, 8-Feb-85 06:42:17 EST
References: <400@decwrl.UUCP> <143@sdcc13.UUCP>
Organization: School of ICS, Georgia Institute of Technology, Atlanta
Lines: 61
Xref: utcs net.lang:1359 net.lang.pascal:223

> 
>     Hashing IS NOT an O(1) operation!
> 
>     Hashing is an O(N)/buckets operation, where many times you can use enough
> buckets to make it very fast.
> 

I would sure like to know how you came up with this information. It seems to
me that you must include the collision handling into this before you can come
up with a blanket statement about what the results are. If fact, I think that
you'll see that with the appropriate collision handling, you come up with an
algorithm that is essentially O(1). Remember, if one operation takes 3 seconds
and another takes 7 seconds they may both be O(1) if they take the same amount
of time no matter what size of input you are dealing with. Before I go into
the different collision schemes, what about a minimal perfect hash ?? The
operation is guaranteed to be O(1) and the table is completely full.

In the following, let (a = number of items to hash/number of buckets) which
is the loading density expressed as a percentage-type thing (eg, .4). Hash
tables operate a little differently in that there are two times associated
with them:
	1) the time to locate an item in the table;
	2) the time to find out that an item is not in the table.

With linear open addressing (use the next open slot) we get

	1) the time to locate an item in the table is
		.5 * (1 + (1 / (1 - a)))
	2) the time to find out it isn't in the table is
		.5 * (1 + (1 / (1 - a)**2))

With rehashing, random probing, and quadratic probing

	1)	1 / (1 - a)
	2)	- (1 / a) * ln(1 - a)

And finally, for linear chaining (which is normally the best)

	1)	a
	2)	1 + (a / 2)

You may also chain the collision elements in a binary tree and end up with
some sort of log to the base 2 instead of the linear element on 'a' and get
an even greater increase in speed which increases logorithmically instead
of linearly (like most of the rest of them).

Some work a friend of mine did was with fixed size tables using a portion of
the table as an overflow chain area (the hash table was 85% of total space
allocated and the overflow area reserved was 15%). When the entire table was
full, the total number of probes was never over about 4.1. This was with
a totally full table of about 75 to 100 elements. It is essentially chained
linear hashing with a fixed amount of space (unlike above which assumes that
you have infinite overflow space available). When the overflow section fills
up, you start allocating space from the hash table so that when the table is
full, there are no empty slots.

	Hope this helps,
-- 
Jeff Lee
CSNet:	Jeff @ GATech		ARPA:	Jeff.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff