flume停机问题

william_Fu_Z 2015-03-16 05:36:24
各位大侠,好:
我采用flume1.5.0.1,采集日志,采用exec source,执行tail命令采集日志,但是经过几十分钟后,flume就停机了。请帮忙分析一下原因。
  配置文件如下:
agent.sources = src_hclog
agent.sinks = sink_hclog
agent.channels = ch_hclog

# configure sources
agent.sources.src_hclog.type = exec
agent.sources.src_hclog.command = tail -F -n 0 /opt/tomcat-hcservice-9092/logs/flowlog
agent.sources.src_hclog.channels = ch_hclog
agent.sources.src_hclog.interceptors = inter
agent.sources.src_hclog.interceptors.inter.type = REGEX_FILTER
agent.sources.src_hclog.interceptors.inter.regex = \\[APPID].*\\[DATE].*\\[LEVEL].*\\[INTERFACE].*STATISTICS-(START|END|CLIENT)*
agent.sources.src_hclog.interceptors.inter.excludeRegex = false
agent.sources.src_hclog.restart = true
agent.sources.src_hclog.restartThrottle = 5000


# configure sinks
agent.sinks.sink_hclog.type = org.apache.flume.plugins.KafkaSink
agent.sinks.sink_hclog.channel = ch_hclog
agent.sinks.sink_hclog.metadata.broker.list = 10.3.32.157:9092
agent.sinks.sink_hclog.partition.key = 0
agent.sinks.sink_hclog.partitioner.class = org.apache.flume.plugins.SinglePartition
agent.sinks.sink_hclog.serializer.class = kafka.serializer.StringEncoder
agent.sinks.sink_hclog.request.required.acks = 0
agent.sinks.sink_hclog.max.message.size = 1000000
agent.sinks.sink_hclog.producer.type = sync
agent.sinks.sink_hclog.custom.encoding = UTF-8
agent.sinks.sink_hclog.custom.topic.name = HCServiceLog

# configure channels
agent.channels.ch_hclog.type = memory
agent.channels.ch_hclog.capacity = 10000
agent.channels.ch_hclog.transactionCapacity = 10000
agent.channels.ch_hclog.byteCapacityBufferPercentage = 20
agent.channels.ch_hclog.byteCapacity = 800000

log如下:
10 三月 2015 15:22:27,550 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.plugins.KafkaSink.process:137) - Send Message to Kafka : [[APPID]HCService [DATE]2015-03-10 15:22:21,064 [THREAD]http-9092-2 [LEVEL]INFO [CLASS]com.xikang.service.util.LogInterceptor [METHOD]convert [LINE]61 [MESSAGE][DURATION]81ms [INTERFACE]POST/account/register *STATISTICS-END*] -- [{ headers:{} body: 5B 41 50 50 49 44 5D 48 43 53 65 72 76 69 63 65 [APPID]HCService }]
10 三月 2015 15:22:54,735 INFO [pool-5-thread-1] (org.apache.flume.source.ExecSource$ExecRunnable.run:362) - Restarting in 5000ms, exit code 129
10 三月 2015 15:22:54,735 INFO [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79) - Stopping lifecycle supervisor 11
10 三月 2015 15:22:54,742 INFO [agent-shutdown-hook] (kafka.utils.Logging$class.info:67) - Shutting down producer
10 三月 2015 15:22:54,743 INFO [agent-shutdown-hook] (kafka.utils.Logging$class.info:67) - Closing all sync producers
10 三月 2015 15:22:54,743 INFO [agent-shutdown-hook] (kafka.utils.Logging$class.info:67) - Disconnecting from xikang-hxwl-D5-app8:9092
10 三月 2015 15:22:54,744 INFO [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83) - Configuration provider stopping
10 三月 2015 15:22:54,744 INFO [agent-shutdown-hook] (org.apache.flume.source.ExecSource.stop:186) - Stopping exec source with command:tail -F -n 0 /opt/tomcat-hcservice-9092/logs/flowlog
10 三月 2015 15:22:54,744 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149) - Component type: SOURCE, name: src_hclog stopped
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:155) - Shutdown Metric for type: SOURCE, name: src_hclog. source.start.time == 1425970016148
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:161) - Shutdown Metric for type: SOURCE, name: src_hclog. source.stop.time == 1425972174744
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append-batch.accepted == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append-batch.received == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append.accepted == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.append.received == 0
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.events.accepted == 865
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.events.received == 865
10 三月 2015 15:22:54,745 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: SOURCE, name: src_hclog. src.open-connection.count == 0
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149) - Component type: CHANNEL, name: ch_hclog stopped
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:155) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.start.time == 1425970015666
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:161) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.stop.time == 1425972174746
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.capacity == 10000
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.current.size == 3
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.put.attempt == 488
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.put.success == 488
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.take.attempt == 853
10 三月 2015 15:22:54,746 INFO [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:177) - Shutdown Metric for type: CHANNEL, name: ch_hclog. channel.event.take.success == 485


我自己分析是因为exec source的执行tail命令的进程退出了,导致exec source被stop,但是我设置了restart=true,应该还会建立一个新的java进程执行tail,但是没有,flume就这么停机了。
好像有个监控能够察觉到exec source中的执行tail命令的进程退出似的,然后它调用了exec source的stop,问题时哪个监控是怎么发现exec source中的一个java进程退出的?
...全文
2078 6 打赏 收藏 转发到动态 举报
写回复
用AI写文章
6 条回复
切换为时间正序
请发表友善的回复…
发表回复
fbpcchen 2015-09-08
  • 打赏
  • 举报
回复
没有后台启动吧。
calmkey 2015-08-20
  • 打赏
  • 举报
回复
你好~~~我也遇到这个问题了~~~怎么解决啊???
william_Fu_Z 2015-04-10
  • 打赏
  • 举报
回复
楼上的,你好 我用strace跟踪了flume发现是由于linux内核给flume进程发送了sighup信号,默认一个进程收到sighup信号,会退出。 linux发送sighup信号的原因没找到。但是服务器上,还有另外一个flume运行。貌似不能有2个flume在一台服务器上运行。杀死那个flume进程后,我的flume就没出现退出的情况了。 希望对你有帮助。
wjx1015 2015-04-09
  • 打赏
  • 举报
回复
我也出现跟你类似情况 09 四月 2015 12:09:25,677 INFO [pool-6-thread-1] (org.apache.flume.source.ExecSource$ExecRunnable.run:362) - Restarting in 20000ms, exit code 143 09 四月 2015 12:09:28,335 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160) - Unable to deliver event. Exception follows. java.lang.IllegalStateException: Channel closed [channel=c2]. Due to java.lang.NullPointerException: null at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:352) at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122) at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:333) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException 你找到解决办法没?
william_Fu_Z 2015-03-17
  • 打赏
  • 举报
回复
补充,我自己写了一个flume的source,接收数据,这个source,是打开一个socketserver 接收数据,而不是新建一个java进程执行tail,因此,这个source不会挂。 但是flume,经过大约3个小时后,还是挂了。 错误信息是: 16 三月 2015 17:46:28,938 INFO [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:78) - Stopping lifecycle supervisor 11 16 三月 2015 17:46:28,941 INFO [agent-shutdown-hook] (org.apache.flume.source.NioReaderSource.stop:67) - Stopping NioReaderSource 毫无征兆,在stop前,没有任何错误输出。 我怀疑flume是被linux给kill掉了。 flume究竟为什么会挂,请各位大侠分析一下。 请热心人站出来。

20,808

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧