求教,flume写入hdfs问题!

u010363909 2013-05-08 04:16:02
flume配置如下:
[demoe3base@kf-app1 conf]$ cat flume-conf.conf
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = source1
agent1.sinks = hdfssink1

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30

# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.source1.channels = ch1
agent1.sources.source1.type = avro
agent1.sources.source1.bind = 172.21.3.60
agent1.sources.source1.port = 44444
agent1.sources.source1.threads = 5

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.hdfssink1.channel = ch1
agent1.sinks.hdfssink1.type = hdfs
agent1.sinks.hdfssink1.hdfs.path = hdfs://kf-app1:8020/flume
agent1.sinks.hdfssink1.hdfs.writeFormat = Text
agent1.sinks.hdfssink1.hdfs.fileType = DataStream
agent1.sinks.hdfssink1.hdfs.rollInterval = 0
agent1.sinks.hdfssink1.hdfs.rollSize = 60554432
agent1.sinks.hdfssink1.hdfs.rollCount = 0
agent1.sinks.hdfssink1.hdfs.batchSize = 1000
agent1.sinks.hdfssink1.hdfs.txnEventMax = 1000
agent1.sinks.hdfssink1.hdfs.callTimeout = 60000
agent1.sinks.hdfssink1.hdfs.appendTimeout = 60000

用命令启动:bin/flume-ng agent --conf ./conf/ -f conf/ flume-conf.conf -n agent1
一切正常,而且flume.log日志也正常。
用bin/flume-ng avro-client -H kf-app1 -p 44444 -F /chunk1/demo/flume/test2.txt发送文件,flume.log如下:
[demoe3base@kf-app1 logs]$ tail -f flume.log
08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:82) - Component type: SOURCE, name: source1 started
08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.AvroSource.start:155) - Avro source source1 started.
08 五月 2013 14:34:45,932 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] OPEN
08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444
08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:39262
08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] DISCONNECTED
08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] UNBOUND
08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] CLOSED
08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:39262 disconnected.
08 五月 2013 14:34:46,922 INFO [hdfs-hdfssink1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189) - Creating hdfs://kf-app1:8020//FlumeData.1367994886244.tmp

问题来了:1、为什么是“FlumeData.1367994886244.tmp”临时文件,而不能将文件关闭呢?当我把代理强行kill掉或者关掉后,日志才打印出“08 五月 2013 14:21:17,556 INFO [hdfs-hdfssink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.renameBucket:379) - Renaming hdfs://kf-app1:8020/flume/FlumeData.1367993804350.tmp to hdfs://kf-app1:8020/flume/FlumeData.1367993804350”,难道说代理不能够自动关闭?

2、而且在发送第二个文件后发现日志报错UNBOUND,难道说一个通道直接接收一个文件?
08 五月 2013 14:30:47,202 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] OPEN
08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444
08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:38652
08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] DISCONNECTED
08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] UNBOUND
08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] CLOSED
08 五月 2013 14:30:47,914 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:38652 disconnected.

以上两个问题还请明白的或者遇到过的给予指点呀。
...全文
6163 14 打赏 收藏 转发到动态 举报
写回复
用AI写文章
14 条回复
切换为时间正序
请发表友善的回复…
发表回复
hhlllp 2015-08-20
  • 打赏
  • 举报
回复
为什么我的flume会收到3份一样的数据
蜉蝣撼大树 2015-05-26
  • 打赏
  • 举报
回复
我的配置hdfs是通过zookeeper配置的集群,source的数据来源是log4j,后台总是报错org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.
蜉蝣撼大树 2015-05-26
  • 打赏
  • 举报
回复
我的配置和你的一样,hdfs是通过zookeeper配置的集群,source的数据来源是log4j,后台总是报错org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.
chiweitree 2015-01-30
  • 打赏
  • 举报
回复
http://blog.csdn.net/simonchi/article/details/43231891 我的博客好几篇关于flume的文章
fvn4edal 2014-03-04
  • 打赏
  • 举报
回复
flume 1.4.0 中用hdfs.idleTimeout 设置为5(sec)就行了,大家可以看看这篇文章,有很大启发http://boylook.blog.51cto.com/7934327/1308188
海兰 2013-11-18
  • 打赏
  • 举报
回复
对啊,请问解决了没有呢,分享下经验呀
small2013bird 2013-11-15
  • 打赏
  • 举报
回复
遇到同样的问题,请问解决了吗?
長胸為富 2013-05-25
  • 打赏
  • 举报
回复
不懂帮顶,顺带长点知识
我想飞走 2013-05-24
  • 打赏
  • 举报
回复
学习下学习下
波特王子 2013-05-17
  • 打赏
  • 举报
回复
好像是这样的: 1. avro会将您的日志收集起来放到一个文件中,当它达到设定的大小是才执行“Renaming”操作(或者强制kill时执行); 2. UNBOUND也困扰我一段时间,我的结论是,这不是一句报错,不信您仔细看看,那一行根本没有“ERROR”之类的提示。UNBOUND只是表示,当前这个日志文件没有达到设定的大小,不需要“ Renaming”为一个单独的文件。“ Renaming”之后一般会另起一个*.tmp文件开始写入。 这是我的理解,欢迎批评指正。
引用 楼主 u010363909 的回复:
flume配置如下: [demoe3base@kf-app1 conf]$ cat flume-conf.conf # Finally, now that we've defined all of our components, tell # agent1 which ones we want to activate. agent1.channels = ch1 agent1.sources = source1 agent1.sinks = hdfssink1 # Define a memory channel called ch1 on agent1 agent1.channels.ch1.type = memory agent1.channels.ch1.capacity = 100000 agent1.channels.ch1.transactionCapacity = 100000 agent1.channels.ch1.keep-alive = 30 # Define an Avro source called avro-source1 on agent1 and tell it # to bind to 0.0.0.0:41414. Connect it to channel ch1. agent1.sources.source1.channels = ch1 agent1.sources.source1.type = avro agent1.sources.source1.bind = 172.21.3.60 agent1.sources.source1.port = 44444 agent1.sources.source1.threads = 5 # Define a logger sink that simply logs all events it receives # and connect it to the other end of the same channel. agent1.sinks.hdfssink1.channel = ch1 agent1.sinks.hdfssink1.type = hdfs agent1.sinks.hdfssink1.hdfs.path = hdfs://kf-app1:8020/flume agent1.sinks.hdfssink1.hdfs.writeFormat = Text agent1.sinks.hdfssink1.hdfs.fileType = DataStream agent1.sinks.hdfssink1.hdfs.rollInterval = 0 agent1.sinks.hdfssink1.hdfs.rollSize = 60554432 agent1.sinks.hdfssink1.hdfs.rollCount = 0 agent1.sinks.hdfssink1.hdfs.batchSize = 1000 agent1.sinks.hdfssink1.hdfs.txnEventMax = 1000 agent1.sinks.hdfssink1.hdfs.callTimeout = 60000 agent1.sinks.hdfssink1.hdfs.appendTimeout = 60000 用命令启动:bin/flume-ng agent --conf ./conf/ -f conf/ flume-conf.conf -n agent1 一切正常,而且flume.log日志也正常。 用bin/flume-ng avro-client -H kf-app1 -p 44444 -F /chunk1/demo/flume/test2.txt发送文件,flume.log如下: [demoe3base@kf-app1 logs]$ tail -f flume.log 08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:82) - Component type: SOURCE, name: source1 started 08 五月 2013 14:34:31,370 INFO [lifecycleSupervisor-1-3] (org.apache.flume.source.AvroSource.start:155) - Avro source source1 started. 08 五月 2013 14:34:45,932 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] OPEN 08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444 08 五月 2013 14:34:45,938 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:39262 08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] DISCONNECTED 08 五月 2013 14:34:46,267 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] UNBOUND 08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x34bf1d3b, /172.21.3.61:39262 :> /172.21.3.60:44444] CLOSED 08 五月 2013 14:34:46,268 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:39262 disconnected. 08 五月 2013 14:34:46,922 INFO [hdfs-hdfssink1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189) - Creating hdfs://kf-app1:8020//FlumeData.1367994886244.tmp 问题来了:1、为什么是“FlumeData.1367994886244.tmp”临时文件,而不能将文件关闭呢?当我把代理强行kill掉或者关掉后,日志才打印出“08 五月 2013 14:21:17,556 INFO [hdfs-hdfssink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.renameBucket:379) - Renaming hdfs://kf-app1:8020/flume/FlumeData.1367993804350.tmp to hdfs://kf-app1:8020/flume/FlumeData.1367993804350”,难道说代理不能够自动关闭? 2、而且在发送第二个文件后发现日志报错UNBOUND,难道说一个通道直接接收一个文件? 08 五月 2013 14:30:47,202 INFO [pool-6-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] OPEN 08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] BOUND: /172.21.3.60:44444 08 五月 2013 14:30:47,203 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 => /172.21.3.60:44444] CONNECTED: /172.21.3.61:38652 08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] DISCONNECTED 08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] UNBOUND 08 五月 2013 14:30:47,913 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream:171) - [id: 0x5a9b8ff9, /172.21.3.61:38652 :> /172.21.3.60:44444] CLOSED 08 五月 2013 14:30:47,914 INFO [pool-7-thread-1] (org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed:209) - Connection to /172.21.3.61:38652 disconnected. 以上两个问题还请明白的或者遇到过的给予指点呀。
u010363909 2013-05-13
  • 打赏
  • 举报
回复
引用 2 楼 tntzbzc 的回复:
不知道怎么解决,帮楼主加个分,等flume高手来回答
多谢版主帮忙,这个问题困扰我N久了,仍旧没有解决
u010363909 2013-05-13
  • 打赏
  • 举报
回复
求高手解答呀,至今仍旧没有解决掉呀
撸大湿 2013-05-10
  • 打赏
  • 举报
回复
不知道怎么解决,帮楼主加个分,等flume高手来回答
撸大湿 2013-05-09
  • 打赏
  • 举报
回复
好长,帮顶一下,回头慢慢看

20,808

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧