Hadoop Streaming Job Failed (Not Successful) in Python

独享123 2018-04-29 10:11:02

Hadoop Streaming Job Failed (Not Successful)
直接cat ./sample.csv | python 1.py | sort -t $'\t' -k1,1 | python 2.py是可以的

但是用hadoop streaming跑报错
shell脚本和日志如下
hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
-input /input/sample.csv \
-output output-streaming \
-mapper 1.py \
-reducer 2.py

18/04/29 21:47:51 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [1.py, 2.py] [] /tmp/streamjob8519686841009274321.jar tmpDir=null
18/04/29 21:47:52 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/29 21:47:52 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/29 21:47:53 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
18/04/29 21:47:53 INFO mapred.FileInputFormat: Total input paths to process : 1
18/04/29 21:47:53 INFO mapreduce.JobSubmitter: number of splits:1
18/04/29 21:47:53 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
18/04/29 21:47:53 INFO Configuration.deprecation: mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename
18/04/29 21:47:53 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
18/04/29 21:47:53 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/04/29 21:47:53 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
18/04/29 21:47:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1935161388_0001
18/04/29 21:47:54 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzz/pythonstreaming/1.py as file:/opt/hadoop/hdfs_tmp/tmp/mapred/local/1525009674510/1.py
18/04/29 21:47:54 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzz/pythonstreaming/2.py as file:/opt/hadoop/hdfs_tmp/tmp/mapred/local/1525009674511/2.py
18/04/29 21:47:55 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/29 21:47:55 INFO mapreduce.Job: Running job: job_local1935161388_0001
18/04/29 21:47:55 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/29 21:47:55 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
18/04/29 21:47:55 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/29 21:47:55 INFO mapred.LocalJobRunner: Starting task: attempt_local1935161388_0001_m_000000_0
18/04/29 21:47:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/29 21:47:55 INFO mapred.MapTask: Processing split: hdfs://master:9000/input/sample.csv:0+1440
18/04/29 21:47:55 INFO mapred.MapTask: numReduceTasks: 10
18/04/29 21:47:55 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/29 21:47:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/29 21:47:55 INFO mapred.MapTask: soft limit at 83886080
18/04/29 21:47:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/29 21:47:55 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/29 21:47:55 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/29 21:47:55 INFO streaming.PipeMapRed: PipeMapRed exec [/home/zzz/pythonstreaming/./1.py]
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
18/04/29 21:47:55 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/29 21:47:55 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
18/04/29 21:47:55 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/04/29 21:47:55 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/04/29 21:47:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/04/29 21:47:55 WARN partition.KeyFieldBasedPartitioner: Using deprecated num.key.fields.for.partition. Use mapreduce.partition.keypartitioner.options instead
18/04/29 21:47:55 WARN partition.KeyFieldBasedPartitioner: Using deprecated num.key.fields.for.partition. Use mapreduce.partition.keypartitioner.options instead
/home/zzz/pythonstreaming/./1.py: 1: /home/zzz/pythonstreaming/./1.py: #!/usr/bin/env: not found
18/04/29 21:47:55 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
18/04/29 21:47:55 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
18/04/29 21:47:56 INFO mapreduce.Job: Job job_local1935161388_0001 running in uber mode : false
18/04/29 21:47:56 INFO mapreduce.Job: map 0% reduce 0%
18/04/29 21:48:01 INFO mapred.LocalJobRunner: hdfs://master:9000/input/sample.csv:0+1440 > map
18/04/29 21:48:02 INFO mapreduce.Job: map 67% reduce 0%
/home/zzz/pythonstreaming/./1.py: 7: /home/zzz/pythonstreaming/./1.py: Syntax error: word unexpected (expecting "do")
18/04/29 21:48:05 INFO streaming.PipeMapRed: MRErrorThread done
18/04/29 21:48:05 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/04/29 21:48:05 INFO mapred.LocalJobRunner: map task executor complete.
18/04/29 21:48:05 WARN mapred.LocalJobRunner: job_local1935161388_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/04/29 21:48:06 INFO mapreduce.Job: Job job_local1935161388_0001 failed with state FAILED due to: NA
18/04/29 21:48:06 INFO mapreduce.Job: Counters: 25
File System Counters
FILE: Number of bytes read=2279
FILE: Number of bytes written=266186
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1440
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=80
Map output records=0
Map output bytes=0
Map output materialized bytes=0
Input split bytes=87
Combine input records=0
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=25
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=264241152
File Input Format Counters
Bytes Read=1440
18/04/29 21:48:06 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

...全文