求教，flume写入hdfs问题！

u010363909 2013-05-08 04:16:02

flume配置如下：
[demoe3base@kf-app1 conf]$ cat flume-conf.conf
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = source1
agent1.sinks = hdfssink1

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30

# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.source1.channels = ch1
agent1.sources.source1.type = avro
agent1.sources.source1.bind = 172.21.3.60
agent1.sources.source1.port = 44444
agent1.sources.source1.threads = 5

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.hdfssink1.channel = ch1
agent1.sinks.hdfssink1.type = hdfs
agent1.sinks.hdfssink1.hdfs.path = hdfs://kf-app1:8020/flume
agent1.sinks.hdfssink1.hdfs.writeFormat = Text
agent1.sinks.hdfssink1.hdfs.fileType = DataStream
agent1.sinks.hdfssink1.hdfs.rollInterval = 0
agent1.sinks.hdfssink1.hdfs.rollSize = 60554432
agent1.sinks.hdfssink1.hdfs.rollCount = 0
agent1.sinks.hdfssink1.hdfs.batchSize = 1000
agent1.sinks.hdfssink1.hdfs.txnEventMax = 1000
agent1.sinks.hdfssink1.hdfs.callTimeout = 60000
agent1.sinks.hdfssink1.hdfs.appendTimeout = 60000

用命令启动：bin/flume-ng agent --conf ./conf/ -f conf/ flume-conf.conf -n agent1
一切正常，而且flume.log日志也正常。
用bin/flume-ng avro-client -H kf-app1 -p 44444 -F /chunk1/demo/flume/test2.txt发送文件，flume.log如下：
[demoe3base@kf-app1 logs]$ tail -f flume.log
08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:82) - Component type: SOURCE, name: source1 started
08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.AvroSource.start:155) - Avro source source1 started.
08 五月 2013 14:34:45,932 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] OPEN
08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444
08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:39262
08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] DISCONNECTED
08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] UNBOUND
08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] CLOSED
08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:39262 disconnected.
08 五月 2013 14:34:46,922 INFO [hdfs-hdfssink1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189) - Creating hdfs://kf-app1:8020//FlumeData.1367994886244.tmp

问题来了：1、为什么是“FlumeData.1367994886244.tmp”临时文件，而不能将文件关闭呢？当我把代理强行kill掉或者关掉后，日志才打印出“08 五月 2013 14:21:17,556 INFO [hdfs-hdfssink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.renameBucket:379) - Renaming hdfs://kf-app1:8020/flume/FlumeData.1367993804350.tmp to hdfs://kf-app1:8020/flume/FlumeData.1367993804350”，难道说代理不能够自动关闭？

2、而且在发送第二个文件后发现日志报错UNBOUND,难道说一个通道直接接收一个文件？
08 五月 2013 14:30:47,202 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] OPEN
08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444
08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:38652
08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] DISCONNECTED
08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] UNBOUND
08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] CLOSED
08 五月 2013 14:30:47,914 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:38652 disconnected.

以上两个问题还请明白的或者遇到过的给予指点呀。

...全文

6180 14 打赏收藏转发到动态举报

写回复

用AI写文章

14 条回复

切换为时间正序

请发表友善的回复…

发表回复

hhlllp 2015-08-20

打赏
举报

为什么我的flume会收到3份一样的数据

蜉蝣撼大树 2015-05-26

打赏
举报

我的配置hdfs是通过zookeeper配置的集群，source的数据来源是log4j，后台总是报错org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.

蜉蝣撼大树 2015-05-26

打赏
举报

我的配置和你的一样，hdfs是通过zookeeper配置的集群，source的数据来源是log4j，后台总是报错org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.

chiweitree 2015-01-30

打赏
举报

http://blog.csdn.net/simonchi/article/details/43231891 我的博客好几篇关于flume的文章

fvn4edal 2014-03-04

打赏
举报

flume 1.4.0 中用hdfs.idleTimeout 设置为5（sec）就行了，大家可以看看这篇文章，有很大启发http://boylook.blog.51cto.com/7934327/1308188

海兰 2013-11-18

打赏
举报

对啊，请问解决了没有呢，分享下经验呀

small2013bird 2013-11-15

打赏
举报

遇到同样的问题，请问解决了吗？

長胸為富 2013-05-25

打赏
举报

不懂帮顶，顺带长点知识

我想飞走 2013-05-24

打赏
举报

学习下学习下

波特王子 2013-05-17

打赏
举报

好像是这样的： 1. avro会将您的日志收集起来放到一个文件中，当它达到设定的大小是才执行“Renaming”操作（或者强制kill时执行）； 2. UNBOUND也困扰我一段时间，我的结论是，这不是一句报错，不信您仔细看看，那一行根本没有“ERROR”之类的提示。UNBOUND只是表示，当前这个日志文件没有达到设定的大小，不需要“ Renaming”为一个单独的文件。“ Renaming”之后一般会另起一个*.tmp文件开始写入。这是我的理解，欢迎批评指正。

引用楼主 u010363909 的回复:

flume配置如下： [demoe3base@kf-app1 conf]$ cat flume-conf.conf # Finally, now that we've defined all of our components, tell # agent1 which ones we want to activate. agent1.channels = ch1 agent1.sources = source1 agent1.sinks = hdfssink1 # Define a memory channel called ch1 on agent1 agent1.channels.ch1.type = memory agent1.channels.ch1.capacity = 100000 agent1.channels.ch1.transactionCapacity = 100000 agent1.channels.ch1.keep-alive = 30 # Define an Avro source called avro-source1 on agent1 and tell it # to bind to 0.0.0.0:41414. Connect it to channel ch1. agent1.sources.source1.channels = ch1 agent1.sources.source1.type = avro agent1.sources.source1.bind = 172.21.3.60 agent1.sources.source1.port = 44444 agent1.sources.source1.threads = 5 # Define a logger sink that simply logs all events it receives # and connect it to the other end of the same channel. agent1.sinks.hdfssink1.channel = ch1 agent1.sinks.hdfssink1.type = hdfs agent1.sinks.hdfssink1.hdfs.path = hdfs://kf-app1:8020/flume agent1.sinks.hdfssink1.hdfs.writeFormat = Text agent1.sinks.hdfssink1.hdfs.fileType = DataStream agent1.sinks.hdfssink1.hdfs.rollInterval = 0 agent1.sinks.hdfssink1.hdfs.rollSize = 60554432 agent1.sinks.hdfssink1.hdfs.rollCount = 0 agent1.sinks.hdfssink1.hdfs.batchSize = 1000 agent1.sinks.hdfssink1.hdfs.txnEventMax = 1000 agent1.sinks.hdfssink1.hdfs.callTimeout = 60000 agent1.sinks.hdfssink1.hdfs.appendTimeout = 60000 用命令启动：bin/flume-ng agent --conf ./conf/ -f conf/ flume-conf.conf -n agent1 一切正常，而且flume.log日志也正常。用bin/flume-ng avro-client -H kf-app1 -p 44444 -F /chunk1/demo/flume/test2.txt发送文件，flume.log如下： [demoe3base@kf-app1 logs]$ tail -f flume.log 08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:82) - Component type: SOURCE, name: source1 started 08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.AvroSource.start:155) - Avro source source1 started. 08 五月 2013 14:34:45,932 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] OPEN 08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444 08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:39262 08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] DISCONNECTED 08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] UNBOUND 08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] CLOSED 08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:39262 disconnected. 08 五月 2013 14:34:46,922 INFO [hdfs-hdfssink1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189) - Creating hdfs://kf-app1:8020//FlumeData.1367994886244.tmp 问题来了：1、为什么是“FlumeData.1367994886244.tmp”临时文件，而不能将文件关闭呢？当我把代理强行kill掉或者关掉后，日志才打印出“08 五月 2013 14:21:17,556 INFO [hdfs-hdfssink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.renameBucket:379) - Renaming hdfs://kf-app1:8020/flume/FlumeData.1367993804350.tmp to hdfs://kf-app1:8020/flume/FlumeData.1367993804350”，难道说代理不能够自动关闭？ 2、而且在发送第二个文件后发现日志报错UNBOUND,难道说一个通道直接接收一个文件？ 08 五月 2013 14:30:47,202 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] OPEN 08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444 08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:38652 08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] DISCONNECTED 08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] UNBOUND 08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] CLOSED 08 五月 2013 14:30:47,914 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:38652 disconnected. 以上两个问题还请明白的或者遇到过的给予指点呀。

u010363909 2013-05-13