这个哈希表构造函数为什么要通过获得一个质数表来获取哈希表的大小?

netxuning 2006-12-20 05:26:23

/* Find a prime near, but greather than or equal to SIZE. The primes
are looked up from a table with a selection of primes convenient
for this purpose.

PRIME_OFFSET is a minor optimization: it specifies start position
for the search for the large enough prime. The final offset is
stored in the same variable. That way the list of primes does not
have to be scanned from the beginning each time around. */

static int
prime_size (int size, int *prime_offset)
{
static const int primes[] = {
13, 19, 29, 41, 59, 79, 107, 149, 197, 263, 347, 457, 599, 787, 1031,
1361, 1777, 2333, 3037, 3967, 5167, 6719, 8737, 11369, 14783,
19219, 24989, 32491, 42257, 54941, 71429, 92861, 120721, 156941,
204047, 265271, 344857, 448321, 582821, 757693, 985003, 1280519,
1664681, 2164111, 2813353, 3657361, 4754591, 6180989, 8035301,
10445899, 13579681, 17653589, 22949669, 29834603, 38784989,
50420551, 65546729, 85210757, 110774011, 144006217, 187208107,
243370577, 316381771, 411296309, 534685237, 695090819, 903618083,
1174703521, 1527114613, 1837299131, 2147483647
};
int i;

for (i = *prime_offset; i < countof (primes); i++)
if (primes[i] >= size)
{
/* Set the offset to the next prime. That is safe because,
next time we are called, it will be with a larger SIZE,
which means we could never return the same prime anyway.
(If that is not the case, the caller can simply reset
*prime_offset.) */
*prime_offset = i + 1;
return primes[i];
}

abort ();
}

struct hash_table *
hash_table_new (int items,
unsigned long (*hash_function) (const void *),
int (*test_function) (const void *, const void *))
{
int size;
struct hash_table *ht = xnew (struct hash_table);

ht->hash_function = hash_function ? hash_function : hash_pointer;
ht->test_function = test_function ? test_function : cmp_pointer;

/* If the size of struct hash_table ever becomes a concern, this
field can go. (Wget doesn't create many hashes.) */
ht->prime_offset = 0;

/* Calculate the size that ensures that the table will store at
least ITEMS keys without the need to resize. */
size = 1 + items / HASH_MAX_FULLNESS;
size = prime_size (size, &ht->prime_offset);
ht->size = size;
ht->resize_threshold = size * HASH_MAX_FULLNESS;
/*assert (ht->resize_threshold >= items);*/

ht->mappings = xnew_array (struct mapping, ht->size); /*allocate space for hash-table*/

/* Mark mappings as empty. We use 0xff rather than 0 to mark empty
keys because it allows us to use NULL/0 as keys. 给哈希表置成全1表示空*/
memset (ht->mappings, INVALID_PTR_BYTE, size * sizeof (struct mapping));

ht->count = 0; /*刚初始化的哈希表的非空条目个数当然为0*/

return ht;
}

...全文

590 5 打赏收藏转发到动态举报

写回复

用AI写文章

5 条回复

切换为时间正序

请发表友善的回复…

发表回复

chai2010 2006-12-20

打赏
举报

hash函数：h(k) = k%m.

在这种情况下，m的某些值显然比其他的值要好得多。
例如：如果m是偶数，则当k是偶数时h(k)将是偶数，
当k是奇数时，h(k)也是奇数，在许多文件中这将导致
一种很大的偏向。若m是计算机基数的乘方，那将是更
坏的，因为k%m将仅是k的最低位上的一些数字（而同
其他数字无关）。类似地，我们可以论证，m大概也不
应该是3的乘方；因为如果诸键都是字符的，则两个仅仅
是字母排列不同的键，在数值上可能仅仅差3的一个倍数
（这是由于(10^n)%3 == (4^n)%3 == 1而出现的现象）。
一般地说，我们要避免能整除(r^k)+/-a的m值，其中k和
a是较小的数且r是字符集合的基数（通常r = 61, 256
或100），因为对这样一个m值求模的余数，往往只是键
中数字的叠加。这样的考虑提示我们选择m为一个素数，
使得(r^k) \= +/-a (modulo m)（即不同余）。已经
发现，对于叫小的k和a，这种选择实际上在所有情况下
都是十分令人满意的。

摘自knuth的《计算机程序设计艺术》3卷6.4节

netxuning 2006-12-20