请教一个算法问题。

keke8247 2014-06-19 10:39:52

加精

如何统计出不同数组中间相同的元素，以及相同元素出现的次数。
比如



                String[] str = {"12","121"};

		String[] str1 = {"12","121"};

		String[] str2 = {"12","121","122"};

		String[] str3 = {"123","1234","125","126","111"};

		String[] str4 = {"1234","126","125","1232","111"};

		String[] str6 = {"12","121","222","1433","1234","126","125","1232"};

我需要得到的结果：
12，121 一块儿出现 4次
12,121,122 一块儿出现 2次
1234,125,126一块儿出现3次
1234,126,125,111 一块儿出现2次
……
迫切需要，在线等

...全文

3307 82 打赏收藏转发到动态举报

写回复

用AI写文章

82 条回复

切换为时间正序

请发表友善的回复…

发表回复

朗晴 2014-07-01

打赏
举报

这个是智力题，考验得是算法。

okafor2011 2014-07-01

打赏
举报

遍历所有组中所有可能的组合放进map 作为key , 如果碰见key存在的加1 , 最后输出map

iamfutureto 2014-06-29

打赏
举报

来看看高手的解答

黄豆粒 2014-06-29

打赏
举报

我只说说大致思路：假设有6个数组，如下： string[] str1 = { "12", "121" }; string[] str2 = { "12", "121", "122" }; string[] str3 = { "123", "1234", "125", "126", "111" }; string[] str4 = { "1234", "126", "125", "1232", "111" }; string[] str6 = { "12", "121", "222", "1433", "1234", "126", "125", "1232" }; 第一步：从str1到str6，把每个数组中数组元素按顺序排列的组合都枚举出来形成新的6个数组： string[][] array1 = {{"12","121"}}; string[][] array2 = {{"12","121"},{"121","122"},{ "12", "121", "122" }}; . . . string[][] array6 = {...}; 第二步：1.取array1的第一个元素{"12","121"}与从array2开始的其它数组的元素逐个比较，如果有匹配的记录下匹配次数，并记录下array1的元素{"12","121"}和匹配次数到Result对象: class Result { //元素序列 public String[] array; //出现次数 public int count; } Result对象保持到一个以元素序列数组为key的HashMap（哈希表）中;2.取array2的第一个元素，在HashMap中查找是否有对应值，如果没有就做刚才相同的操作，如果有就进入3；3.取array2第二个元素，然后做与2类似的操作，下一步取array2第三个元素做类似的操作，再下一步取array3第一个元素做类似操作，依次类推，最后取到array6最后一个元素进行类似操作后只需遍历HashMap输出结果就任务完成。

futianerxd 2014-06-25

打赏
举报

如果你的组合数列，比如 “11”，“22”这中出现的次序是固定的那么完全可以用直接倒排索引实现，如果不是次序固定，可以用数字代替字符，乘积做倒排索引。。。。。

shine333 2014-06-24

打赏
举报


  public static void main(String[] args) {
    int samples = 5000;
    int arrayCount = 200000;
    int tests = 1000;

    OccurrenceCounter counter = new OccurrenceCounter();
    Random random = new Random();
    String[][] arrays = new String[arrayCount][MAX_ELEMENTS_IN_AN_ARRAY];
    for (String[] array : arrays) {
      for (int i = 0; i < array.length; i++) {
        array[i] = String.valueOf(random.nextInt(samples));
      }
    }
    long start1 = System.nanoTime();
    counter.addAll(arrays);
    int count = 0;
    long start2 = System.nanoTime();
    for (int i = 0; i < tests; i++) {
      String x = String.valueOf(random.nextInt(samples));
      String y = String.valueOf(random.nextInt(samples));
      // String z = String.valueOf(random.nextInt(samples));
      count += counter.count(x, y /* , z */);
    }
    System.out.println(count);
    long end = System.nanoTime();
    System.out.println((end - start1) * 0.000000001);
    System.out.println((end - start2) * 0.000000001);
  }

测试了下，查找速度(end - start2)完全能满足要求，到20万个数组（每个数组5个，总样本最多5000个）里面去找500对组合出现的次数，花费不到200毫秒。不过初始化的时间，就相对长了，超过1秒，且有可能用到900个分区。也就是说，这个算法的特性是查找飞快，但初始化较慢。而且，完全是空间换时间。所以，是否采用，完全看楼主具体情况。比如，样本相对固定，但是会比较频繁查找不同组合，或者即使有变化，也是只小批量的新增（可以多次调用addAll方法），就比较适宜用这个方法。反之，如果那个大的数据样本经常变换，查了两三次，样本就可能完全变掉，那就不适宜了。

jiangchao419 2014-06-24

打赏
举报

我表示很复杂，不想思考！

shine333 2014-06-24

打赏
举报

  public static List<Set<String>> allCombinations(String[] array) {
    List<Set<String>> combinations = new LinkedList<Set<String>>();
    // 反向遍历 [2, 2^N-1]
    for (int i = 1 << array.length; i-- > 1;) {
      Set<String> combination = new HashSet<String>();
      for (int j = 0; j < array.length; j++) {
        if ((i & (1 << j)) != 0) {
          combination.add(array[j]);
        }
      }
      if (combination.size() >= 2) {
        combinations.add(combination);
      }
    }
    return combinations;
  }
  
  public static void main(String[] args) {
    String[][] arrays = {{"1", "2", "3", "4", "5"}, {"1", "3", "5"}};
    Map<Set<String>, Integer> counter = new HashMap<Set<String>, Integer>();
    for (String[] array : arrays) {
      List<Set<String>> allCombinations = allCombinations(array);
      for (Set<String> combination : allCombinations) {
        int count = 0;
        if (counter.containsKey(combination)) {
          count = counter.get(combination);
        }
        counter.put(combination, count + 1);
      }
    }

    System.out.println(counter);
  }

概念性代码如上。除了一些容器的初始大小方面的优化的话，基本上就这个样子了

shine333 2014-06-24

打赏
举报

你理解的需求，还有算法一说吗？放一个大的Map<Set<String>, Integer>，每个原始数组，把里面所有情况穷举出所有组合的Set 比如[1,2,3,4,5] 穷举成若干个Set（之所以用Set，避免数组排序） [1,2] [1,3] [1,4] [1,5] [2,3] ... [1,2,3] ... [3,4,5] ... [1,2,3,4,5] 然后到大的Map里面，Int+1即可。至于每个数组穷举所有组合，可以用二进制，代码稍后。

很久之前就开始了 2014-06-24

打赏
举报

引用 73 楼 shine333 的回复:

最后，你49F的代码，基本上没任何参考价值，你的算法，时间复杂度实在太高，而且空间复杂度也没有节约。

大神见谅，无意冒犯！ 49F代码是我对楼主需求的理解，从时间和空间复杂度上将我也感觉没什么价值。回到楼主需求，引用楼主 “如何统计出不同数组中间相同的元素，以及相同元素出现的次数。” 很明白，结果肯定是张统计表（当然这里不是单个元素是组合，楼主后面的帖子有解释）：组合出现词数组合1 10 组合2 100 ...... 按代码来走，就是给你一个List<String[]>你给我吐出上面的统计表来。大神的代码我运行了，是传入组合，获取组合出现次数，所以我觉的和需求有出入！我是菜鸟，有啥理解不对的，万望指教！引用大神代码count方法注释：


         /**
	   * 统计数组出现次数。
	   * 
	   * @param strings
	   *          要同时出现的数组
	   * @return 出现次数
	   */

风行傲天 2014-06-24

打赏
举报

引用 24 楼 keke8247 的回复:

引用 23 楼 shine333 的回复:
很多是多少？
5000+还是有的

这么多数据不会是都存放在数组中的吧！确定没有在数据库中没有吗？如果数据库中有，直接写sql速度更快更简单

shine333 2014-06-24

打赏
举报

最后，你49F的代码，基本上没任何参考价值，你的算法，时间复杂度实在太高，而且空间复杂度也没有节约。

shine333 2014-06-24

打赏
举报

引用 69 楼 maihao110 的回复:

按楼主的意思，你这不对！楼主要的不是查找单个组合的出现次数，而是统计所有可能出现的组合的出现次数！参考#49楼代码，或者帮我优化下，不胜感激！

我单个count方法，就是lz要的，test方法里面，只是为了避免hotspot把代码优化掉，随手写的count += ...

shine333 2014-06-24

打赏
举报

引用 69 楼 maihao110 的回复:

按楼主的意思，你这不对！楼主要的不是查找单个组合的出现次数，而是统计所有可能出现的组合的出现次数！参考#49楼代码，或者帮我优化下，不胜感激！

你看懂了再说话吧...

愤飞的小鸭 2014-06-24

打赏
举报

mark 是算法吗...

很久之前就开始了 2014-06-24

打赏
举报

引用 66 楼 shine333 的回复:

  public static void main(String[] args) {
    int samples = 5000;
    int arrayCount = 200000;
    int tests = 1000;

    OccurrenceCounter counter = new OccurrenceCounter();
    Random random = new Random();
    String[][] arrays = new String[arrayCount][MAX_ELEMENTS_IN_AN_ARRAY];
    for (String[] array : arrays) {
      for (int i = 0; i < array.length; i++) {
        array[i] = String.valueOf(random.nextInt(samples));
      }
    }
    long start1 = System.nanoTime();
    counter.addAll(arrays);
    int count = 0;
    long start2 = System.nanoTime();
    for (int i = 0; i < tests; i++) {
      String x = String.valueOf(random.nextInt(samples));
      String y = String.valueOf(random.nextInt(samples));
      // String z = String.valueOf(random.nextInt(samples));
      count += counter.count(x, y /* , z */);
    }
    System.out.println(count);
    long end = System.nanoTime();
    System.out.println((end - start1) * 0.000000001);
    System.out.println((end - start2) * 0.000000001);
  }
测试了下，查找速度(end - start2)完全能满足要求，到20万个数组（每个数组5个，总样本最多5000个）里面去找500对组合出现的次数，花费不到200毫秒。不过初始化的时间，就相对长了，超过1秒，且有可能用到900个分区。也就是说，这个算法的特性是查找飞快，但初始化较慢。而且，完全是空间换时间。所以，是否采用，完全看楼主具体情况。比如，样本相对固定，但是会比较频繁查找不同组合，或者即使有变化，也是只小批量的新增（可以多次调用addAll方法），就比较适宜用这个方法。反之，如果那个大的数据样本经常变换，查了两三次，样本就可能完全变掉，那就不适宜了。

按楼主的意思，你这不对！楼主要的不是查找单个组合的出现次数，而是统计所有可能出现的组合的出现次数！参考#49楼代码，或者帮我优化下，不胜感激！