请教大牛们一个问题！

redeastfan 2010-04-08 12:32:39

情况是这样的：
从1-33中任选6个数字（无放回），并对其进行排序后，组成一个数组，记为a。
现有两个大集合
A：a1,a2,...,aM
B：b1,b2,...,bN
M,N都很大，通常M有10来万，N有30多万。

现在要针对A中的每一个数组ai，统计b1,...,bN中有多少个与ai有5个元素相同。

我现在的做法就是就是先写一个比较两个数组有几个元素相同的函数，简记为find(a,b)，
a,b为数组，返回相同元素的个数，然后简单的循环遍历。

for i:=0 to M-1 do
for j:=0 to N-1 do
if find(ai,bj)=5 then count[i]:=count[i]+1;

这算起来特别慢，不知道有没有好的方法来提高一下性能，谢谢了。

...全文

116 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

redeastfan 2010-04-12

打赏
举报

[Quote=引用 2 楼 sbwwkmyd 的回复:]
B中的每个数组生成HASH吧，1个变6个，最多30*6万个HASH元素。
这样对于每一个ai时间复杂度为6。
这个循环的总时间复杂度大概为O(6N+6M)。

另外每个这个数组完全可以用一个int32型的整数保存，既省空间又提高运行效率。
[/Quote]

B中的每个数组由1个变6个是不是指6个中任选5个组合？如果这样的话，B中的所有数组生成的由5个数组成的组合中应该有相同的啊？如何生成hash?

罗耗子 2010-04-08

打赏
举报

补充2L的办法,楼主可以把A、B中的数组都用一个unsigned int来表示例如a1[1,2,3,4,5,6]=63，然后针对A中的元素与每一个B中的元素异或，所得结果如果是0或2的n(0,1,...,31)次幂，就说明有5个以上相同

showjim 2010-04-08

打赏
举报

简单写了一个，没有测试

        public static int[] same5count(int[][] A, int[][] B)

        {

            int[] value = new int[A.Length];



            #region B生成HASH

            int index;

            uint bitB, hashB;

            Dictionary<uint, int> count = new Dictionary<uint, int>(B.Length * 6);

            foreach (int[] b in B)

            {

                for (bitB = 0, index = b.Length - 1; index >= 0; index--) bitB |= 1U << (b[index] - 1);

                for (index = b.Length - 1; index >= 0; index--)

                {

                    if (count.ContainsKey(hashB = bitB ^ (1U << (b[index] - 1)))) count[hashB]++;

                    else count.Add(hashB, 1);

                }

            }

            #endregion



            #region 统计

            int[] a;

            uint bitA, hashA;

            for (int i = A.Length - 1; i >= 0; i--)

            {

                for (a = A[i], bitA = 0, index = a.Length - 1; index >= 0; index--) bitA |= 1U << (a[index] - 1);

                for (index = a.Length - 1; index >= 0; index--)

                {

                    if (count.ContainsKey(hashA = bitA ^ (1U << (a[index] - 1)))) value[i] += count[hashA];

                }

            }

            #endregion



            return value;

        }