关于求众数的算法

lokibalder 2008-12-30 04:48:43

众数就是一个数组中，重复出现次数最多的数字。
我看了书上（算法设计与分析习题解答）的算法，但是因为没有完全的代码，所以不知道具体是怎么实现的。
我自己尝试写了一个算法，输入是一个排好序的数组，输出是众数以及众数出现的次数，用JAVA实现的，有些变量定义的可能不是很清楚，我加了一点注释，不知道这样和书上所讲的代码有什么大的区别吗。或者有更好的方法（不要用hash或者是bucket）。望指点。

public class ZhongShu {

public static int largestNum=0;
public static int largestN=0;

public static void start(int[] array){
recur(array,0,array.length-1);
}

public static void recur(int[] a,int l,int r)
{
int lr=0;//出现在中位数左边，并且不等于中位数的数字的个数
int rr=0;//出现在中位数右边，并且不等于中位数的数字的个数
if (l == r){
return;
}

int mpos = median(l,r);//中位数的位置,因为是排序过了，所以就是中间数字的下标
lr = median_left(a,l,mpos);
rr = median_right(a,r,mpos);
int mn = r-l-lr-rr+1;//与中位数相等的数字的个数

//如果超过了当前保存的最大值，则更新最大值
if (mn > largestN){
largestN = mn;
largestNum = a[mpos];
}

if (mn<lr){
recur(a,l,l+lr-1);
}
if (mn<rr){
recur(a,r-rr+1,r);
}
}

//找到中位数的方法，就是直接返回中间的数字简单的下标
public static int median(int l,int r){
return (l+r)/2;
}

//累计出现[l,mpos)，并且不等于a[mpos]的数字个数
public static int median_left(int[] a,int l,int mpos){
int m = a[mpos];
int i;
for (i=mpos-1;i>=l;--i){
if (a[i] != m){
break;
}
}
return i-l+1;
}

//累计出现（mpos,r]，并且不等于a[mpos]的数字个数
public static int median_right(int[] a,int r,int mpos){
int m = a[mpos];
int i;
for (i=mpos+1;i<=r;i++){
if (a[i] != m){
break;
}
}
return r-i+1;
}

public static void main(String args[])
{
int aaa[] = {1,1,1,2,2,3,3,3,3,5};
ZhongShu.start(aaa);
System.out.println("result is "+ZhongShu.largestNum +" "+ZhongShu.largestN);
}
}

...全文

3670 12 打赏收藏转发到动态举报

写回复

用AI写文章

12 条回复

切换为时间正序

请发表友善的回复…

发表回复

tian_dao_chou_qin 2009-07-25

打赏
举报

谢谢各位提供的算法
但貌似好像不需要求中位数吧
这种球中位数的算法也太复杂了吧？

大王派我去巡山 2008-12-31

打赏
举报

用hash或者计数的方法要根据数据分布特点而言，并不是一定就能达到O(n);

如果是针对有序数组而言，确实在遍历的过程中可以象ls说的进行一定的优化。

lokibalder 2008-12-31

打赏
举报

[Quote=引用 5 楼 dlyme 的回复:]
首先，lz把题目要求私自改换了。原题的输入数据并不是经过排序的。
其次，lz并没有体会到原来题目中这套代码的真正含义。

对于众数问题来说，它的时间下限就是O(n*logn)（不考虑计数的方法）。
如果先对数组进行排序，那后面直接遍历一遍就可以统计出结果了，还用得着中位数这样麻烦吗？
lz难道没有考虑过对于未经过排序的数组来说，这样的方法也适用吗？（当然中位数的找法就不一样了）

原书中已经讨论过怎样在线性时间…
[/Quote]
原来是这样，我明白了。我没有看书，所以不知道算法中找中位数的代码是什么，而且看到输入的数据是正好是排好序的，现在发现题目里没有说数据是排序过的。

绿色夹克衫 2008-12-31

打赏
举报

用hash的话,即使没有序也可以O(n)吧!

如果有序的话,也是O(n),但O可以小于1。

针对于有序情况可以做的优化方法

1、对于已检测出的最大计数Max,可以将查找步长增加至Max/2，这样凡是计数>=Max的，必将被检测到2次或以上。
2、可以利用中位数，将原数组递归分段，来统计每个数重复了多少次,对于已经求出的Max，可以进行剪枝，如果数据大量重复，效率可大幅度提高，
比如：如果存在某个元素占总元素的二分之一以上，仅需4次以内就可求得该数。



        private void button2_Click(object sender, EventArgs e)

        {

            int[] array = new int[] { 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 6, 6, 7, 8, 10, 11, 13, 13, 13, 13, 13, 13, 13, 13, 19, 20, 21, 24 };

            int[] k;



            //有序情况找众数法    

            k = Zhongshu1(array);



            array = new int[] { 20, 21, 13, 24, 1, 13, 13, 13, 1, 2, 10, 11, 13, 13, 2, 2, 2, 3, 3, 3, 2, 6, 7, 2, 2, 2, 6, 8, 13, 13, 13, 2, 19 };



            //无序情况找众数

            k = ZhongshuNoSort(array);

        }



       //有序情况下的找众数

        private int[] Zhongshu1(int[] source)

        {

            int currentValue = source[0];

            int[] zhongshu = new int[source.Length];



            int max = 1;

            int count = 0;

            int position = -1;



            for (int i = 0; i <= source.Length; i++)

            {

                if (i < source.Length && source[i] == currentValue)

                    count++;

                else

                {

                    if (count > max)

                    {

                        //如果出现了更大的计数

                        max = count;

                        position = 0;

                        zhongshu[0] = source[i - 1];

                    }

                    else if (count == max)

                        zhongshu[++position] = source[i - 1];



                    if (i < source.Length)

                    {

                        currentValue = source[i];

                        count = 1;

                    }

                }

            }



            //如果没有众数

            if (max <= 1)

                return null;



            Array.Resize(ref zhongshu,position + 1);

            return zhongshu;

        }



        //无序情况找众数

        private int[] ZhongshuNoSort(int[] source)

        {

            int count;

            bool flag = false;

            Dictionary<int, int> myDict = new Dictionary<int, int>();



            for (int i = 0; i < source.Length; i++)

            {

                if (myDict.TryGetValue(source[i],out count))

                {

                    flag = true;

                    myDict[source[i]]++;

                }

                else

                    myDict.Add(source[i], 1);

            }



            //如果没有众数

            if(!flag)

                return null;



            int max = 0;

            int position = 0;

            int[] zhongshu = new int[source.Length];



            //遍历hash表,复杂度<= O(n)

            foreach (KeyValuePair<int, int> myKey in myDict)

            {

                if (myKey.Value > max)

                {

                    max = myKey.Value;

                    position = 0;

                    zhongshu[0] = myKey.Key;

                }

                else if (myKey.Value == max)

                    zhongshu[++position] = myKey.Key;

            }



            Array.Resize(ref zhongshu, position + 1);

            return zhongshu;

        }

yaos 2008-12-30

打赏
举报

如果输出数字和次数
Prelude List>(\x -> (head x, length x)) $ foldl (\x y -> if (length x) > (length y) then x else y) [] $ group $ sort l

yaos 2008-12-30

打赏
举报

Haskell代码
假设数据存储在l中
Prelude List> foldl (\x y -> if (length x) > (length y) then x else y) [] $ group $ sort l

knate 2008-12-30

打赏
举报

实质上也是等于排序了吧。

这问题应该可以归约到排序。

大王派我去巡山 2008-12-30

打赏
举报

首先，lz把题目要求私自改换了。原题的输入数据并不是经过排序的。
其次，lz并没有体会到原来题目中这套代码的真正含义。

对于众数问题来说，它的时间下限就是O(n*logn)（不考虑计数的方法）。
如果先对数组进行排序，那后面直接遍历一遍就可以统计出结果了，还用得着中位数这样麻烦吗？
lz难道没有考虑过对于未经过排序的数组来说，这样的方法也适用吗？（当然中位数的找法就不一样了）

原书中已经讨论过怎样在线性时间内找出第k大元素，正是基于这样的方法，我们可以在线性时间内找出中位数并对数据进行划分，然后再进行分治。
T(n)=2*T(n/2)+O(n)
不需要对数据进行排序，一样可以在O(n*logn)的时间内求出众数，这才是代码的思路.

大王派我去巡山 2008-12-30

打赏
举报

[Quote=引用 3 楼 sssssjjjj 的回复:]
不错的算法
复杂度小于O（n）
学习！
[/Quote]
复杂度小于O(n)？当然不可能！
T(n)=2*T(n/2)，还是O(n)的算法，并没有本质上的改变。

sssssjjjj 2008-12-30

打赏
举报

不错的算法
复杂度小于O（n）
学习！

lokibalder 2008-12-30

打赏
举报

[Quote=引用 1 楼 dlyme 的回复:]
引用楼主 lokibalder 的帖子:
我自己尝试写了一个算法，输入是一个排好序的数组...

如果是已经排序过的数组，遍历一遍就可以出结果了，还有什么讨论的必要吗？
[/Quote]
书上的算法是通过找中位数来求的，我不知道具体的代码，能不能给个大致的算法。

大王派我去巡山 2008-12-30