flume停机问题
各位大侠,好:
我采用flume1.5.0.1,采集日志,采用exec source,执行tail命令采集日志,但是经过几十分钟后,flume就停机了。请帮忙分析一下原因。
配置文件如下:
agent.sources = src_hclog
agent.sinks = sink_hclog
agent.channels = ch_hclog
# configure sources
agent.sources.src_hclog.type = exec
agent.sources.src_hclog.command = tail -F -n 0 /opt/tomcat-hcservice-9092/logs/flowlog
agent.sources.src_hclog.channels = ch_hclog
agent.sources.src_hclog.interceptors = inter
agent.sources.src_hclog.interceptors.inter.type = REGEX_FILTER
agent.sources.src_hclog.interceptors.inter.regex = \\[APPID].*\\[DATE].*\\[LEVEL].*\\[INTERFACE].*STATISTICS-(START|END|CLIENT)*
agent.sources.src_hclog.interceptors.inter.excludeRegex = false
agent.sources.src_hclog.restart = true
agent.sources.src_hclog.restartThrottle = 5000
# configure sinks
agent.sinks.sink_hclog.type = org.apache.flume.plugins.KafkaSink
agent.sinks.sink_hclog.channel = ch_hclog
agent.sinks.sink_hclog.metadata.broker.list = 10.3.32.157:9092
agent.sinks.sink_hclog.partition.key = 0
agent.sinks.sink_hclog.partitioner.class = org.apache.flume.plugins.SinglePartition
agent.sinks.sink_hclog.serializer.class = kafka.serializer.StringEncoder
agent.sinks.sink_hclog.request.required.acks = 0
agent.sinks.sink_hclog.max.message.size = 1000000
agent.sinks.sink_hclog.producer.type = sync
agent.sinks.sink_hclog.custom.encoding = UTF-8
agent.sinks.sink_hclog.custom.topic.name = HCServiceLog
# configure channels
agent.channels.ch_hclog.type = memory
agent.channels.ch_hclog.capacity = 10000
agent.channels.ch_hclog.transactionCapacity = 10000
agent.channels.ch_hclog.byteCapacityBufferPercentage = 20
agent.channels.ch_hclog.byteCapacity = 800000
log如下:
10 三月 2015 15:22:27,550 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.plugins.KafkaSink.process:137) - Send Message to Kafka : [[APPID]HCService [DATE]2015-03-10 15:22:21,064 [THREAD]http-9092-2 [LEVEL]INFO [CLASS]com.xikang.service.util.LogInterceptor [METHOD]convert [LINE]61 [MESSAGE][DURATION]81ms [INTERFACE]POST/account/register *STATISTICS-END*] -- [{ headers:{} body: 5B 41 50 50 49 44 5D 48 43 53 65 72 76 69 63 65 [APPID]HCService }]
10 三月 2015 15:22:54,735 INFO [pool-5-thread-1] (org.apache.flume.source.ExecSource$ExecRunnable.run:362) - Restarting in 5000ms, exit code 129
10 三月 2015 15:22:54,735 INFO [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79) - Stopping lifecycle supervisor 11
10 三月 2015 15:22:54,742 INFO [agent-shutdown-hook] (kafka.utils.Logging$class.info:67) - Shutting down producer
10 三月 2015 15:22:54,743 INFO [agent-shutdown-hook] (kafka.utils.Logging$class.info:67) - Closing all sync producers
10 三月 2015 15:22:54,743 INFO [agent-shutdown-hook] (kafka.utils.Logging$class.info:67) - Disconnecting from xikang-hxwl-D5-app8:9092
10 三月 2015 15:22:54,744 INFO [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83) - Configuration provider stopping
10 三月 2015 15:22:54,744 INFO [agent-shutdown-hook] (org.apache.flume.source.ExecSource.stop:186) - Stopping exec source with command:tail -F -n 0 /opt/tomcat-hcservice-9092/logs/flowlog
10 三月 2015 15:22:54,744 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149) - Component type: SOURCE, name: src_hclog stopped
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:155) - Shutdown Metric for type: SOURCE, name: src_hclog. source.start.time == 1425970016148
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:161) - Shutdown Metric for type: SOURCE, name: src_hclog. source.stop.time == 1425972174744
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append-batch.accepted == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append-batch.received == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append.accepted == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append.received == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.events.accepted == 865
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.events.received == 865
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.open-connection.count == 0
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149) - Component type: CHANNEL, name: ch_hclog stopped
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:155) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.start.time == 1425970015666
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:161) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.stop.time == 1425972174746
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.capacity == 10000
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.current.size == 3
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.put.attempt == 488
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.put.success == 488
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.take.attempt == 853
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.take.success == 485
我自己分析是因为exec source的执行tail命令的进程退出了,导致exec source被stop,但是我设置了restart=true,应该还会建立一个新的java进程执行tail,但是没有,flume就这么停机了。
好像有个监控能够察觉到exec source中的执行tail命令的进程退出似的,然后它调用了exec source的stop,问题时哪个监控是怎么发现exec source中的一个java进程退出的?