这个哈希表构造函数为什么要通过获得一个质数表来获取哈希表的大小?

netxuning 2006-12-20 05:26:23
/* Find a prime near, but greather than or equal to SIZE. The primes
are looked up from a table with a selection of primes convenient
for this purpose.

PRIME_OFFSET is a minor optimization: it specifies start position
for the search for the large enough prime. The final offset is
stored in the same variable. That way the list of primes does not
have to be scanned from the beginning each time around. */

static int
prime_size (int size, int *prime_offset)
{
static const int primes[] = {
13, 19, 29, 41, 59, 79, 107, 149, 197, 263, 347, 457, 599, 787, 1031,
1361, 1777, 2333, 3037, 3967, 5167, 6719, 8737, 11369, 14783,
19219, 24989, 32491, 42257, 54941, 71429, 92861, 120721, 156941,
204047, 265271, 344857, 448321, 582821, 757693, 985003, 1280519,
1664681, 2164111, 2813353, 3657361, 4754591, 6180989, 8035301,
10445899, 13579681, 17653589, 22949669, 29834603, 38784989,
50420551, 65546729, 85210757, 110774011, 144006217, 187208107,
243370577, 316381771, 411296309, 534685237, 695090819, 903618083,
1174703521, 1527114613, 1837299131, 2147483647
};
int i;

for (i = *prime_offset; i < countof (primes); i++)
if (primes[i] >= size)
{
/* Set the offset to the next prime. That is safe because,
next time we are called, it will be with a larger SIZE,
which means we could never return the same prime anyway.
(If that is not the case, the caller can simply reset
*prime_offset.) */
*prime_offset = i + 1;
return primes[i];
}

abort ();
}


struct hash_table *
hash_table_new (int items,
unsigned long (*hash_function) (const void *),
int (*test_function) (const void *, const void *))
{
int size;
struct hash_table *ht = xnew (struct hash_table);

ht->hash_function = hash_function ? hash_function : hash_pointer;
ht->test_function = test_function ? test_function : cmp_pointer;

/* If the size of struct hash_table ever becomes a concern, this
field can go. (Wget doesn't create many hashes.) */
ht->prime_offset = 0;

/* Calculate the size that ensures that the table will store at
least ITEMS keys without the need to resize. */
size = 1 + items / HASH_MAX_FULLNESS;
size = prime_size (size, &ht->prime_offset);
ht->size = size;
ht->resize_threshold = size * HASH_MAX_FULLNESS;
/*assert (ht->resize_threshold >= items);*/

ht->mappings = xnew_array (struct mapping, ht->size); /*allocate space for hash-table*/

/* Mark mappings as empty. We use 0xff rather than 0 to mark empty
keys because it allows us to use NULL/0 as keys. 给哈希表置成全1表示空*/
memset (ht->mappings, INVALID_PTR_BYTE, size * sizeof (struct mapping));

ht->count = 0; /*刚初始化的哈希表的非空条目个数当然为0*/

return ht;
}
...全文
590 5 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
5 条回复
切换为时间正序
请发表友善的回复…
发表回复
chai2010 2006-12-20
  • 打赏
  • 举报
回复

hash函数:h(k) = k%m.

在这种情况下,m的某些值显然比其他的值要好得多。
例如:如果m是偶数,则当k是偶数时h(k)将是偶数,
当k是奇数时,h(k)也是奇数,在许多文件中这将导致
一种很大的偏向。若m是计算机基数的乘方,那将是更
坏的,因为k%m将仅是k的最低位上的一些数字(而同
其他数字无关)。类似地,我们可以论证,m大概也不
应该是3的乘方;因为如果诸键都是字符的,则两个仅仅
是字母排列不同的键,在数值上可能仅仅差3的一个倍数
(这是由于(10^n)%3 == (4^n)%3 == 1而出现的现象)。
一般地说,我们要避免能整除(r^k)+/-a的m值,其中k和
a是较小的数且r是字符集合的基数(通常r = 61, 256
或100),因为对这样一个m值求模的余数,往往只是键
中数字的叠加。这样的考虑提示我们选择m为一个素数,
使得(r^k) \= +/-a (modulo m)(即不同余)。已经
发现,对于叫小的k和a,这种选择实际上在所有情况下
都是十分令人满意的。

摘自knuth的《计算机程序设计艺术》3卷6.4节
netxuning 2006-12-20
  • 打赏
  • 举报
回复
cjq87 2006-12-20
  • 打赏
  • 举报
回复
不知道有没有数学高人能证明一下
cjq87 2006-12-20
  • 打赏
  • 举报
回复
应该是用质数产生冲突的可能性小吧
netxuning 2006-12-20
  • 打赏
  • 举报
回复
这是wget里边的代码

70,020

社区成员

发帖
与我相关
我的任务
社区描述
C语言相关问题讨论
社区管理员
  • C语言
  • 花神庙码农
  • 架构师李肯
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧