Big.Data.Analytics.with.Spark.and.Hadoop.17858下载

weixin_39821620 2020-08-14 06:00:52
Key Features
This book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools.
Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and Spa
相关下载链接://download.csdn.net/download/ramissue/9658258?utm_source=bbsseo
...全文
27 回复 打赏 收藏 转发到动态 举报
写回复
用AI写文章
回复
切换为时间正序
请发表友善的回复…
发表回复
Key Features This book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools. Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and SparkR. Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall. Book Description Big Data Analytics book aims at providing the fundamentals of Apache Spark and Hadoop. All Spark components – Spark Core, Spark SQL, DataFrames, Data sets, Conventional Streaming, Structured Streaming, MLlib, Graphx and Hadoop core components – HDFS, MapReduce and Yarn are explored in greater depth with implementation examples on Spark + Hadoop clusters. It is moving away from MapReduce to Spark. So, advantages of Spark over MapReduce are explained at great depth to reap benefits of in-memory speeds. DataFrames API, Data Sources API and new Data set API are explained for building Big Data analytical applications. Real-time data analytics using Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. New Structured streaming concept is explained with an IOT (Internet of Things) use case. Machine learning techniques are covered using MLLib, ML Pipelines and SparkR and Graph Analytics are covered with GraphX and GraphFrames components of Spark. Readers will also get an opportunity to get started with web based notebooks such as Jupyter, Apache Zeppelin and data flow tool Apache NiFi to analyze and visualize data. What you will learn Find out and implement the tools and techniques of big data analytics using Spark on Hadoop clusters with wide variety of tools used with Spark and Hadoop Understand all the Hadoop and Spark ecosystem components Get to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines and Graphx See batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured Streaming Get to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Hivemall, Graphx, SparkR and Hivemall. About the Author Venkat Ankam has over 18 years of IT experience and over 5 years in big data technologies, working with customers to design and develop scalable big data applications. Having worked with multiple clients globally, he has tremendous experience in big data analytics using Hadoop and Spark. He is a Cloudera Certified Hadoop Developer and Administrator and also a Databricks Certified Spark Developer. He is the founder and presenter of a few Hadoop and Spark meetup groups globally and loves to share knowledge with the community. Venkat has delivered hundreds of trainings, presentations, and white papers in the big data sphere. While this is his first attempt at writing a book, many more books are in the pipeline. Table of Contents Chapter 1: Big Data Analytics at a 10,000-Foot View Chapter 2: Getting Started with Apache Hadoop and Apache Spark Chapter 3: Deep Dive into Apache Spark Chapter 4: Big Data Analytics with Spark SQL, DataFrames, and Datasets Chapter 5: Real-Time Analytics with Spark Streaming and Structured Streaming Chapter 6: Notebooks and Dataflows with Spark and Hadoop Chapter 7: Machine Learning with Spark and Hadoop Chapter 8: Building Recommendation Systems with Spark and Mahout Chapter 9: Graph Analytics with GraphX Chapter 10: Interactive Analytics with SparkR
Big Data Analytics with Spark is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. In addition, this book will help you become a much sought-after Spark expert. Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful technology that meets that need. You can, for example, use Spark to perform low latency computations through the use of efficient caching and iterative algorithms; leverage the features of its shell for easy and interactive Data analysis; employ its fast batch processing and low latency features to process your real time data streams and so on. As a result, adoption of Spark is rapidly growing and is replacing Hadoop MapReduce as the technology of choice for big data analytics. This book provides an introduction to Spark and related big-data technologies. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, and MLlib. Big Data Analytics with Spark is therefore written for busy professionals who prefer learning a new technology from a consolidated source instead of spending countless hours on the Internet trying to pick bits and pieces from different sources. The book also provides a chapter on Scala, the hottest functional programming language, and the program that underlies Spark. You’ll learn the basics of functional programming in Scala, so that you can write Spark applications in it. What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to know is programming in any language. There is a critical shortage of people with big data expertise, so companies are willing to pay top dollar for people with skills in areas like Spark and Scala. So reading this book and absorbing its principles will provide a boost―possibly a big boost―to your career. Table of Contents Chapter 1: Big Data Technology Landscape Chapter 2: Programming in Scala Chapter 3: Spark Core Chapter 4: Interactive Data Analysis with Spark Shell Chapter 5: Writing a Spark Application Chapter 6: Spark Streaming Chapter 7: Spark SQL Chapter 8: Machine Learning with Spark Chapter 9: Graph Processing with Spark Chapter 10: Cluster Managers Chapter 11: Monitoring
With Microsoft HDInsight, business professionals and data analysts can rapidly leverage the power of Hadoop on a flexible, scalable cloud-based platform, using Microsoft's accessible business intelligence, visualization, and productivity tools. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to provision, configure, monitor, troubleshoot, and use HDInsight, even if you're new to big data analytics. Each short, easy lesson builds on all that's come before: you'll learn all of HDInsight's essentials as you solve real data analytics problems. Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours covers all this, and much more: Introduction of Big Data, NoSQL systems, its Business Value Proposition and use cases examples Introduction to Hadoop, Architecture, Ecosystem and Microsoft HDInsight Getting to know Hadoop 2.0 and the innovations it provides like HDFS2 and YARN Quickly installing, configuring, and monitoring Hadoop (HDInsight) clusters in the cloud and automating cluster provisioning Customize the HDInsight cluster and install additional Hadoop ecosystem projects using Script Actions Administering HDInsight from the Hadoop command prompt or Microsoft PowerShell Using the Microsoft Azure HDInsight Emulator for learning or development Understanding HDFS, HDFS vs. Azure Blob Storage, MapReduce Job Framework and Job Execution Pipeline Doing big data analytics with MapReduce, writing your MapReduce programs in your choice of .NET programming language such as C# Using Hive for big data analytics, demonstrate end to end scenario and how Apache Tez improves the performance several folds Consuming HDInsight data from Microsoft BI Tools over Hive ODBC Driver - Using HDInsight with Microsoft BI and Power BI to simplify data integration, analysis, and reporting Using PIG for big data transformation workflows step by step Apache HBase on HDInsight, its architecture, data model, HBase vs. Hive, programmatically managing HBase data with C# and Apache Phoenix Using Sqoop or SSIS (SQL Server Integration Services) to move data to/from HDInsight and build data integration workflows for transferring data Using Oozie for scheduling, co-ordination and managing data processing workflows in HDInsight cluster Using R programming language with HDInsight for performing statistical computing on Big Data sets Using Apache Spark's in-memory computation model to run big data analytics up to 100 times faster than Hadoop MapReduce Perform real-time Stream Analytics on high-velocity big data streams with Storm Integration of Enterprise Data Warehouse with Hadoop and Microsoft Analytics Platform System (APS), formally known as SQL Server Parallel Data Warehouse (PDW) Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid problems. By the time you're finished, you'll be comfortable going beyond the book to create any HDInsight app you can imagine! Table of Contents Part I: Understanding Big Data, Hadoop 1.0, and 2.0 Hour 1. Introduction of Big Data, NoSQL, and Business Value Proposition Hour 2. Introduction to Hadoop, Its Architecture, Ecosystem, and Microsoft Offerings Hour 3. Hadoop Distributed File System Versions 1.0 and 2.0 Hour 4. The MapReduce Job Framework and Job Execution Pipeline Hour 5. MapReduce—Advanced Concepts and YARN Part II: Getting Started with HDInsight and Understanding Its Different Components Hour 6. Getting Started with HDInsight, Provisioning Your HDInsight Service Cluster, and Automating HDInsight Cluster Provisioning Hour 7. Exploring Typical Components of HDFS Cluster Hour 8. Storing Data in Microsoft Azure Storage Blob Hour 9. Working with Microsoft Azure HDInsight Emulator Part III: Programming MapReduce and HDInsight Script Action Hour 10. Programming MapReduce Jobs Hour 11. Customizing the HDInsight Cluster with Script Action Part IV: Querying and Processing Big Data in HDInsight Hour 12. Getting Started with Apache Hive and Apache Tez in HDInsight Hour 13. Programming with Apache Hive, Apache Tez in HDInsight, and Apache HCatalog Hour 14. Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 1 Hour 15. Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 2 Hour 16. Integrating HDInsight with SQL Server Integration Services Hour 17. Using Pig for Data Processing Hour 18. Using Sqoop for Data Movement Between RDBMS and HDInsight Part V: Managing Workflow and Performing Statistical Computing Hour 19. Using Oozie Workflows and Job Orchestration with HDInsight Hour 20. Performing Statistical Computing with R Part VI: Performing Interactive Analytics and Machine Learning Hour 21. Performing Big Data Analytics with Spark Hour 22. Microsoft Azure Machine Learning Part VII: Performing Real-time Analytics Hour 23. Performing Stream Analytics with Storm Hour 24. Introduction to Apache HBase on HDInsight

12,779

社区成员

发帖
与我相关
我的任务
社区描述
CSDN 下载资源悬赏专区
其他 技术论坛(原bbs)
社区管理员
  • 下载资源悬赏专区社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧