flume 使用source interceptors拦截器切分body,自定义字段发送数据至es,es并未预期切分显示
flume配置如下:
agent.sources = tail
agent.channels = memoryChannel
agent.sinks = elasticsearch
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 10000
agent.channels.c1.transactionCapacity = 100
agent.sources.tail.channels = memoryChannel
agent.sources.tail.type = syslogudp
agent.sources.tail.port = 44445
agent.sources.tail.host = 10.121.12.59
agent.sources.tail.interceptors=i1 i2 i3
agent.sources.tail.interceptors.i1.type=REGEX_EXTRACTOR
agent.sources.tail.interceptors.i1.regex = (^<.*>)([A-Za-z]{0,5})\\s([0-9]{1,2})\\s(\\d{2}:\\d{2}:\\d{2})\\s(\\d{4}):(.*)
agent.sources.tail.interceptors.i1.serializers = s1 s2 s3 s4 s5 s6
agent.sources.tail.interceptors.i1.serializers.s1.name = t1
agent.sources.tail.interceptors.i1.serializers.s2.name = t2
agent.sources.tail.interceptors.i1.serializers.s3.name = t3
agent.sources.tail.interceptors.i1.serializers.s4.name = t4
agent.sources.tail.interceptors.i1.serializers.s5.name = t5
agent.sources.tail.interceptors.i1.serializers.s6.name = t6
#add here up
agent.sources.tail.interceptors.i2.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
agent.sources.tail.interceptors.i3.type=org.apache.flume.interceptor.HostInterceptor$Builder
agent.sources.tail.interceptors.i3.hostHeader = host
agent.sinks.elasticsearch.channel = memoryChannel
agent.sinks.elasticsearch.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticsearch.batchSize=100
agent.sinks.elasticsearch.hostNames=10.121.12.59:9300
#agent.sinks.k1.indexType = bar_type
agent.sinks.elasticsearch.indexName=logstash5
agent.sinks.elasticsearch.clusterName=elasticsearch
agent.sinks.elasticsearch.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer
正则表达式
agent.sources.tail.interceptors.i1.regex = (^<.*>)([A-Za-z]{0,5})\\s([0-9]{1,2})\\s(\\d{2}:\\d{2}:\\d{2})\\s(\\d{4}):(.*)
发送数据
echo "<96>Apr 19 11:30:46 2016:TEST" | nc -u 10.121.12.59 44445
es可以正确解析,t1-t6正确显示,并且数据对应
正则表达式
agent.sources.tail.interceptors.i1.regex = (^<.*>)([A-Za-z]{0,5})\\s([0-9]{1,2})\\s(\\d{2}:\\d{2}:\\d{2})\\s(\\d{4})\\s(.*)
发送数据
echo "<96>Apr 19 11:30:46 2016 TEST" | nc -u 10.121.12.59 44445
es不可以正确解析,而es收到的数据message中就只有TEST了,t1-t6也没有
猜测:
经过反复测试,猜测是es内部在解析<96>Apr_19_11:30:46_2016_ (为了显示效果,下划线代表空格),这样的格式时,通过其他内部方式解析了。。
求解答,本人对java不熟,如果需要更改flume源代码,请详细告知在哪里更改,谢谢,