社区
IBM云计算
帖子详情
关于Hadoop运行examples包里面sort的问题
yander2861
2013-04-19 04:03:06
在集群上运行examples包里面sort的时候,提示not a SequenceFile
我想请问下,如何使得我从本地上传到集群里面的文件是SequenceFile呢?
...全文
281
2
打赏
收藏
关于Hadoop运行examples包里面sort的问题
在集群上运行examples包里面sort的时候,提示not a SequenceFile 我想请问下,如何使得我从本地上传到集群里面的文件是SequenceFile呢?
复制链接
扫一扫
分享
转发到动态
举报
写回复
配置赞助广告
用AI写文章
2 条
回复
切换为时间正序
请发表友善的回复…
发表回复
打赏红包
sen_lin8350
2013-05-20
打赏
举报
回复
试一下使用bin/hadoop fs -put srouce des 这个命令
南风bu竞
2013-05-03
打赏
举报
回复
Hadoop是什么都不知道的路过。。。
apache
hadoop
2.7.2.chm
apahe
hadoop
2.7.2 官方文档,离线版 General Overview Single Node Setup Cluster Setup
Hadoop
Commands Reference FileSystem Shell
Hadoop
Compatibility Interface Classification FileSystem Specification Common CLI Mini Cluster Native Libraries Proxy User Rack Awareness Secure Mode Service Level Authorization HTTP Authentication
Hadoop
KMS Tracing HDFS HDFS User Guide HDFS Commands Reference High Availability With QJM High Availability With NFS Federation ViewFs Guide HDFS Snapshots HDFS Architecture Edits Viewer Image Viewer Permissions and HDFS Quotas and HDFS HFTP C API libhdfs WebHDFS REST API HttpFS Gateway Short Circuit Local Reads Centralized Cache Management HDFS NFS Gateway HDFS Rolling Upgrade Extended Attributes Transparent Encryption HDFS Support for Multihoming Archival Storage, SSD & Memory Memory Storage Support MapReduce MapReduce Tutorial MapReduce Commands Reference Compatibilty between
Hadoop
1.x and
Hadoop
2.x Encrypted Shuffle Pluggable Shuffle/
Sort
Distributed Cache Deploy MapReduce REST APIs MR Application Master MR History Server YARN Overview YARN Architecture Capacity Scheduler Fair Scheduler ResourceManager Restart ResourceManager HA Node Labels Web Application Proxy YARN Timeline Server Writing YARN Applications YARN Commands NodeManager Restart DockerContainerExecutor Using CGroups Secure Containers Registry YARN REST APIs Introduction Resource Manager Node Manager Timeline Server
Hadoop
Compatible File Systems Amazon S3 Azure Blob Storage OpenStack Swift Auth Overview
Example
s Configuration Building Tools
Hadoop
Streaming
Hadoop
Archives DistCp GridMix Rumen Scheduler Load Simulator Reference Release Notes API docs Common CHANGES.txt HDFS CHANGES.txt MapReduce CHANGES.txt YARN CHANGES.txt Metrics Configuration core-default.xml hdfs-default.xml mapred-default.xml yarn-default.xml Deprecated Properties
php-
hadoop
streaming:用于
Hadoop
流的 php utils
php-
hadoop
streaming 安装 添加到 composer.json { "require": { "makotokw/
hadoop
streaming": "dev-master" } } 用法 没有
hadoop
的简单测试 cd
example
s/wordcount php mapper.php < word.txt |
sort
| php reducer.php 使用
hadoop
流媒体
hadoop
-standalone/bin/
hadoop
jar
hadoop
-standalone/
hadoop
-streaming.jar\ -input
example
s/wordcount/word.txt\ -output
example
s/wordcount/output\ -mapper 'php
example
s/wordco
Hadoop
入门和大数据应用
Hadoop
入门和大数据应用视频教程,该课程主要分享
Hadoop
基础及大数据方面的基础知识。讲师介绍:翟周伟,就职于百度,
Hadoop
技术讲师,专注于
Hadoop
&大数据、数据挖掘、自然语言处理等领域。2009年便开始利用
Hadoop
构建商业级大数据系统,是国内该领域早的一批人之一,负责设计过多个基于
Hadoop
的大数据平台和分析系统。2011年合著出版《
Hadoop
开源云计算平台》。在自然语言处理领域申请过一项发明专利。新出版书籍 《
Hadoop
核心技术》 。
Hadoop
对文本文件的快速全局排序实现方法及分析
一、背景
Hadoop
中实现了用于全局排序的InputSampler类和TotalOrderPartitioner类,调用示例是org.apache.
hadoop
.
example
s.
Sort
。 但是当我们以Text文件作为输入时,结果并非按Text中的string列排序,而且输出结果是SequenceFile。 原因: 1)
hadoop
在处理Text文件时,key是行号LongWritable类型,InputSampler抽样的是key,TotalOrderPartitioner也是用key去查找分区。这样,抽样得到的partition文件是对行号的抽样,结果自然是根据行号来排序。
hadoop
_the_definitive_guide_3nd_edition
Hadoop
definitive 第三版, 目录如下 1. Meet
Hadoop
. . . 1 Data! 1 Data Storage and Analysis 3 Comparison with Other Systems 4 RDBMS 4 Grid Computing 6 Volunteer Computing 8 A Brief History of
Hadoop
9 Apache
Hadoop
and the
Hadoop
Ecosystem 12
Hadoop
Releases 13 What’s Covered in this Book 14 Compatibility 15 2. MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 A Weather Dataset 17 Data Format 17 Analyzing the Data with Unix Tools 19 Analyzing the Data with
Hadoop
20 Map and Reduce 20 Java MapReduce 22 Scaling Out 30 Data Flow 31 Combiner Functions 34 Running a Distributed MapReduce Job 37
Hadoop
Streaming 37 Ruby 37 Python 40 iii www.it-ebooks.info
Hadoop
Pipes 41 Compiling and Running 42 3. The
Hadoop
Distributed Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 The Design of HDFS 45 HDFS Concepts 47 Blocks 47 Namenodes and Datanodes 48 HDFS Federation 49 HDFS High-Availability 50 The Command-Line Interface 51 Basic Filesystem Operations 52
Hadoop
Filesystems 54 Interfaces 55 The Java Interface 57 Reading Data from a
Hadoop
URL 57 Reading Data Using the FileSystem API 59 Writing Data 62 Directories 64 Querying the Filesystem 64 Deleting Data 69 Data Flow 69 Anatomy of a File Read 69 Anatomy of a File Write 72 Coherency Model 75 Parallel Copying with distcp 76 Keeping an HDFS Cluster Balanced 78
Hadoop
Archives 78 Using
Hadoop
Archives 79 Limitations 80 4.
Hadoop
I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Data Integrity 83 Data Integrity in HDFS 83 LocalFileSystem 84 ChecksumFileSystem 85 Compression 85 Codecs 87 Compression and Input Splits 91 Using Compression in MapReduce 92 Serialization 94 The Writable Interface 95 Writable Classes 98 iv | Table of Contents www.it-ebooks.info Implementing a Custom Writable 105 Serialization Frameworks 110 Avro 112 File-Based Data Structures 132 SequenceFile 132 MapFile 139 5. Developing a MapReduce Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 The Configuration API 146 Combining Resources 147 Variable Expansion 148 Configuring the Development Environment 148 Managing Configuration 148 GenericOptionsParser, Tool, and ToolRunner 151 Writing a Unit Test 154 Mapper 154 Reducer 156 Running Locally on Test Data 157 Running a Job in a Local Job Runner 157 Testing the Driver 161 Running on a Cluster 162 Packaging 162 Launching a Job 162 The MapReduce Web UI 164 Retrieving the Results 167 Debugging a Job 169
Hadoop
Logs 173 Remote Debugging 175 Tuning a Job 176 Profiling Tasks 177 MapReduce Workflows 180 Decomposing a Problem into MapReduce Jobs 180 JobControl 182 Apache Oozie 182 6. How MapReduce Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Anatomy of a MapReduce Job Run 187 Classic MapReduce (MapReduce 1) 188 YARN (MapReduce 2) 194 Failures 200 Failures in Classic MapReduce 200 Failures in YARN 202 Job Scheduling 204 Table of Contents | v www.it-ebooks.info The Fair Scheduler 205 The Capacity Scheduler 205 Shuffle and
Sort
205 The Map Side 206 The Reduce Side 207 Configuration Tuning 209 Task Execution 212 The Task Execution Environment 212 Speculative Execution 213 Output Committers 215 Task JVM Reuse 216 Skipping Bad Records 217 7. MapReduce Types and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 MapReduce Types 221 The Default MapReduce Job 225 Input Formats 232 Input Splits and Records 232 Text Input 243 Binary Input 247 Multiple Inputs 248 Database Input (and Output) 249 Output Formats 249 Text Output 250 Binary Output 251 Multiple Outputs 251 Lazy Output 255 Database Output 256 8. MapReduce Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Counters 257 Built-in Counters 257 User-Defined Java Counters 262 User-Defined Streaming Counters 266
Sort
ing 266 Preparation 266 Partial
Sort
268 Total
Sort
272 Secondary
Sort
276 Joins 281 Map-Side Joins 282 Reduce-Side Joins 284 Side Data Distribution 287 vi | Table of Contents www.it-ebooks.info Using the Job Configuration 287 Distributed Cache 288 MapReduce Library Classes 294 9. Setting Up a
Hadoop
Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Cluster Specification 295 Network Topology 297 Cluster Setup and Installation 299 Installing Java 300 Creating a
Hadoop
User 300 Installing
Hadoop
300 Testing the Installation 301 SSH Configuration 301
Hadoop
Configuration 302 Configuration Management 303 Environment Settings 305 Important
Hadoop
Daemon Properties 309
Hadoop
Daemon Addresses and Ports 314 Other
Hadoop
Properties 315 User Account Creation 318 YARN Configuration 318 Important YARN Daemon Properties 319 YARN Daemon Addresses and Ports 322 Security 323 Kerberos and
Hadoop
324 Delegation Tokens 326 Other Security Enhancements 327 Benchmarking a
Hadoop
Cluster 329
Hadoop
Benchmarks 329 User Jobs 331
Hadoop
in the Cloud 332
Hadoop
on Amazon EC2 332 10. Administering
Hadoop
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 HDFS 337 Persistent Data Structures 337 Safe Mode 342 Audit Logging 344 Tools 344 Monitoring 349 Logging 349 Metrics 350 Java Management Extensions 353 Table of Contents | vii www.it-ebooks.info Maintenance 355 Routine Administration Procedures 355 Commissioning and Decommissioning Nodes 357 Upgrades 360 11. Pig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Installing and Running Pig 366 Execution Types 366 Running Pig Programs 368 Grunt 368 Pig Latin Editors 369 An
Example
369 Generating
Example
s 371 Comparison with Databases 372 Pig Latin 373 Structure 373 Statements 375 Expressions 379 Types 380 Schemas 382 Functions 386 Macros 388 User-Defined Functions 389 A Filter UDF 389 An Eval UDF 392 A Load UDF 394 Data Processing Operators 397 Loading and Storing Data 397 Filtering Data 397 Grouping and Joining Data 400
Sort
ing Data 405 Combining and Splitting Data 406 Pig in Practice 407 Parallelism 407 Parameter Substitution 408 12. Hive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Installing Hive 412 The Hive Shell 413 An
Example
414 Running Hive 415 Configuring Hive 415 Hive Services 417 viii | Table of Contents www.it-ebooks.info The Metastore 419 Comparison with Traditional Databases 421 Schema on Read Versus Schema on Write 421 Updates, Transactions, and Indexes 422 HiveQL 422 Data Types 424 Operators and Functions 426 Tables 427 Managed Tables and External Tables 427 Partitions and Buckets 429 Storage Formats 433 Importing Data 438 Altering Tables 440 Dropping Tables 441 Querying Data 441
Sort
ing and Aggregating 441 MapReduce Scripts 442 Joins 443 Subqueries 446 Views 447 User-Defined Functions 448 Writing a UDF 449 Writing a UDAF 451 13. HBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 HBasics 457 Backdrop 458 Concepts 458 Whirlwind Tour of the Data Model 458 Implementation 459 Installation 462 Test Drive 463 Clients 465 Java 465 Avro, REST, and Thrift 468
Example
469 Schemas 470 Loading Data 471 Web Queries 474 HBase Versus RDBMS 477 Successful Service 478 HBase 479 Use Case: HBase at Streamy.com 479 Table of Contents | ix www.it-ebooks.info Praxis 481 Versions 481 HDFS 482 UI 483 Metrics 483 Schema Design 483 Counters 484 Bulk Load 484 14. ZooKeeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Installing and Running ZooKeeper 488 An
Example
490 Group Membership in ZooKeeper 490 Creating the Group 491 Joining a Group 493 Listing Members in a Group 494 Deleting a Group 496 The ZooKeeper Service 497 Data Model 497 Operations 499 Implementation 503 Consistency 505 Sessions 507 States 509 Building Applications with ZooKeeper 510 A Configuration Service 510 The Resilient ZooKeeper Application 513 A Lock Service 517 More Distributed Data Structures and Protocols 519 ZooKeeper in Production 520 Resilience and Performance 521 Configuration 522 15. Sqoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Getting Sqoop 525 A Sample Import 527 Generated Code 530 Additional Serialization Systems 531 Database Imports: A Deeper Look 531 Controlling the Import 534 Imports and Consistency 534 Direct-mode Imports 534 Working with Imported Data 535 x | Table of Contents www.it-ebooks.info Imported Data and Hive 536 Importing Large Objects 538 Performing an Export 540 Exports: A Deeper Look 541 Exports and Transactionality 543 Exports and SequenceFiles 543 16. Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Hadoop
Usage at Last.fm 545 Last.fm: The Social Music Revolution 545
Hadoop
at Last.fm 545 Generating Charts with
Hadoop
546 The Track Statistics Program 547 Summary 554
Hadoop
and Hive at Facebook 554 Introduction 554
Hadoop
at Facebook 554 Hypothetical Use Case Studies 557 Hive 560 Problems and Future Work 564 Nutch Search Engine 565 Background 565 Data Structures 566 Selected
Example
s of
Hadoop
Data Processing in Nutch 569 Summary 578 Log Processing at Rackspace 579 Requirements/The Problem 579 Brief History 580 Choosing
Hadoop
580 Collection and Storage 580 MapReduce for Logs 581 Cascading 587 Fields, Tuples, and Pipes 588 Operations 590 Taps, Schemes, and Flows 592 Cascading in Practice 593 Flexibility 596
Hadoop
and Cascading at ShareThis 597 Summary 600 TeraByte
Sort
on Apache
Hadoop
601 Using Pig and Wukong to Explore Billion-edge Network Graphs 604 Measuring Community 606 Everybody’s Talkin’ at Me: The Twitter Reply Graph 606 Table of Contents | xi www.it-ebooks.info Symmetric Links 609 Community Extraction 610 A. Installing Apache
Hadoop
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 B. Cloudera’s Distribution for
Hadoop
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 C. Preparing the NCDC Weather Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
IBM云计算
1,151
社区成员
244
社区内容
发帖
与我相关
我的任务
IBM云计算
该论坛主要探讨基于IBM云计算的开发技术,并为网友们提供自由交流的平台。
复制链接
扫一扫
分享
社区描述
该论坛主要探讨基于IBM云计算的开发技术,并为网友们提供自由交流的平台。
社区管理员
加入社区
获取链接或二维码
近7日
近30日
至今
加载中
查看更多榜单
社区公告
暂无公告
试试用AI创作助手写篇文章吧
+ 用AI写文章