Elasticsearch terms 聚合后分页处理（问题已解决）

潦草的人生 2022-12-09 14:20:47

一、场景如下



// 数据数量
GET /test_data/_count

{
  "count" : 300018,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}



// 数据结构
GET /test_data/_search?size=2


{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_data",
        "_type" : "gtid",
        "_id" : "kVc-7IQBTJ6sJOcxHqQH",
        "_score" : 1.0,
        "_source" : {
          "set" : "192.168.37.39",
          "gtid" : "5919e4aa-9c-a9-69744452-19"
        }
      },
      {
        "_index" : "test_data",
        "_type" : "gtid",
        "_id" : "klc-7IQBTJ6sJOcxHqQH",
        "_score" : 1.0,
        "_source" : {
          "set" : "192.168.37.40",
          "gtid" : "81c8fb6b-59-55-19621cdf-70"
        }
      }
    ]
  }
}

对gtid以及set 进行聚合操作，做深度处理

GET /test_data/_search?filter_path=aggregations
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "many_ipaddr_by_gtid": {
      "terms": {
        "field": "gtid.keyword",
        "size": 2
      },
      "aggs": {
        "ip_addr": {
          "terms": {
            "field": "set.keyword"
          }
        }
      }
    }
  }
}


{
  "aggregations" : {
    "many_ipaddr_by_gtid" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 300008,
      "buckets" : [
        {
          "key" : "1-2e-2-637f4ff1-1f",
          "doc_count" : 8,
          "ip_addr" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "192.168.37.42",
                "doc_count" : 4
              },
              {
                "key" : "192.168.37.37",
                "doc_count" : 3
              },
              {
                "key" : "192.168.37.38",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "1-2e-3-637f4ff3-17",
          "doc_count" : 2,
          "ip_addr" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "192.168.37.37",
                "doc_count" : 1
              },
              {
                "key" : "192.168.37.42",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

二、遇到的问题

1、对聚合后的数据做分页处理，

一个时实现得到总分页数，另一个是进行分页。

想咨询，了解一下，实际的分页数是针对gtid 不是针对 set进行分页

三、借鉴的一些资料上

1、https://www.cnblogs.com/LiuFqiang/p/16793018.html

2、https://blog.csdn.net/UbuntuTouch/article/details/103679273

3、https://blog.csdn.net/laoyang360/article/details/79112946

第一个可能相对比较能看懂一些，但是依旧摸不到头绪

其中有一段是这样的筛选，我针对我自己的数据摸拟，无法获取到有效的数据信息，没有任何信息变化，与第一次获取一样

GET /test_data/_search
{
  "from": 0,
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "many_ipaddr_by_gtid": {
      "terms": {
        "field": "gtid.keyword",
        "size": 3
      },
      "aggs": {
        "gtid_sort": {
          "bucket_sort": {
            "from": 1,
            "size": 3,
            "sort": []
          }
        },
        "aggs": {
          "ip_addr": {
            "terms": {
              "field": "set.keyword"
            }
          }
        }
      }
    },
    "count_gtids": {
      "cardinality": {
        "field": "extern.paragraph_id"
      }
    }
  }
}



// error：


{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Aggregation [bucketSort] cannot define sub-aggregations"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Aggregation [bucketSort] cannot define sub-aggregations"
  },
  "status" : 400
}

我不是很理解 bucket_sort 和聚合后的分页，是否有人能帮忙看看问题。

...全文

579 3 打赏收藏转发到动态举报

写回复

用AI写文章

3 条回复

切换为时间正序

请发表友善的回复…

发表回复

潦草的人生 2022-12-12

打赏
举报

字段折叠，

通过字段折叠实现了我想要的功能。聚合功能过于复杂，数据量大字段精简，只需要逻辑上通过字段折叠，就能解决这一部分功能

Elastic中国官方社区 2022-12-11

打赏
举报

你可以阅读一下这篇文章 “Elasticsearch：在 Elasticsearch 中的 Composite Aggregation” https://elasticstack.blog.csdn.net/article/details/105369709

潦草的人生 2022-12-11

@Elastic中国官方社区谢谢您的支持与回复，对于composite，我找不到例子做子聚合貌似。因为我看过您关于composite的相关文章介绍，存在两个疑问，一个是使用composite做子聚合，还有一个好像是准确性。。。我测试过使用composite，做一个分组，能实现向后翻页的功能，但是貌似我无法实现我的需求，希望能得到您的建议和指导。。。