10,431
社区成员




一、场景如下
// 数据数量
GET /test_data/_count
{
"count" : 300018,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
// 数据结构
GET /test_data/_search?size=2
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_data",
"_type" : "gtid",
"_id" : "kVc-7IQBTJ6sJOcxHqQH",
"_score" : 1.0,
"_source" : {
"set" : "192.168.37.39",
"gtid" : "5919e4aa-9c-a9-69744452-19"
}
},
{
"_index" : "test_data",
"_type" : "gtid",
"_id" : "klc-7IQBTJ6sJOcxHqQH",
"_score" : 1.0,
"_source" : {
"set" : "192.168.37.40",
"gtid" : "81c8fb6b-59-55-19621cdf-70"
}
}
]
}
}
对gtid以及set 进行聚合操作,做深度处理
GET /test_data/_search?filter_path=aggregations
{
"query": {
"match_all": {}
},
"aggs": {
"many_ipaddr_by_gtid": {
"terms": {
"field": "gtid.keyword",
"size": 2
},
"aggs": {
"ip_addr": {
"terms": {
"field": "set.keyword"
}
}
}
}
}
}
{
"aggregations" : {
"many_ipaddr_by_gtid" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 300008,
"buckets" : [
{
"key" : "1-2e-2-637f4ff1-1f",
"doc_count" : 8,
"ip_addr" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "192.168.37.42",
"doc_count" : 4
},
{
"key" : "192.168.37.37",
"doc_count" : 3
},
{
"key" : "192.168.37.38",
"doc_count" : 1
}
]
}
},
{
"key" : "1-2e-3-637f4ff3-17",
"doc_count" : 2,
"ip_addr" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "192.168.37.37",
"doc_count" : 1
},
{
"key" : "192.168.37.42",
"doc_count" : 1
}
]
}
}
]
}
}
}
二、遇到的问题
1、对聚合后的数据做分页处理,
一个时实现 得到总分页数,另一个是 进行分页。
想咨询,了解一下,实际的分页数是 针对gtid 不是针对 set进行分页
三、借鉴的一些资料上
1、https://www.cnblogs.com/LiuFqiang/p/16793018.html
2、https://blog.csdn.net/UbuntuTouch/article/details/103679273
3、https://blog.csdn.net/laoyang360/article/details/79112946
第一个可能相对比较能看懂一些,但是依旧摸不到头绪
其中有一段是这样的筛选,我针对我自己的数据摸拟,无法获取到有效的数据信息,没有任何信息变化,与第一次获取一样
GET /test_data/_search
{
"from": 0,
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"many_ipaddr_by_gtid": {
"terms": {
"field": "gtid.keyword",
"size": 3
},
"aggs": {
"gtid_sort": {
"bucket_sort": {
"from": 1,
"size": 3,
"sort": []
}
},
"aggs": {
"ip_addr": {
"terms": {
"field": "set.keyword"
}
}
}
}
},
"count_gtids": {
"cardinality": {
"field": "extern.paragraph_id"
}
}
}
}
// error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Aggregation [bucketSort] cannot define sub-aggregations"
}
],
"type" : "illegal_argument_exception",
"reason" : "Aggregation [bucketSort] cannot define sub-aggregations"
},
"status" : 400
}
我不是很理解 bucket_sort 和 聚合后的分页,是否有人能帮忙看看问题。
字段折叠,
通过字段折叠 实现了我想要的功能。聚合功能过于复杂,数据量大字段精简,只需要逻辑上通过字段折叠,就能解决这一部分功能
你可以阅读一下这篇文章 “Elasticsearch:在 Elasticsearch 中的 Composite Aggregation” https://elasticstack.blog.csdn.net/article/details/105369709