Practical Guide to setup Hadoop and Spark Cluster using CDH

Practical Guide to setup Hadoop and Spark Cluster using CDH

Practical Guide to setup Hadoop and Spark Cluster using CDH

Step by step instructions to setup Hadoop and Spark Cluster using Cloudera Distribution of Hadoop (Formerly CCA 131)

Language: english

Note: 4.5/5 (482 notes) 10,360 students

Instructor(s): Durga Viswanatha Raju Gadiraju

Last update: 2019-06-06

What you’ll learn

  • Learn Hadoop and Spark Administration using CDH
  • Provision Cluster from GCP (Google Cloud Platform) to setup Hadoop and Spark Cluster using CDH
  • Setup Ansible for server automation to setup pre-requisites to setup Hadoop and Spark Cluster using CDH
  • Setup 8 node cluster from scratch using CDH
  • Understand Architecture of HDFS, YARN, Spark, Hive, Hue and many more

 

Requirements

  • Basic Linux Skills
  • A 64 bit computer with minimum of 4 GB RAM
  • Operating System – Windows 10 or Mac or Linux Flavor

 

Description

Cloudera is one of the leading vendor for distributions related to Hadoop and Spark. As part of this Practical Guide, you will learn step by step process of setting up Hadoop and Spark Cluster using CDH.

Install – Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

  • Set up a local CDH repository

  • Perform OS-level configuration for Hadoop installation

  • Install Cloudera Manager server and agents

  • Install CDH using Cloudera Manager

  • Add a new node to an existing cluster

  • Add a service using Cloudera Manager

Configure – Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

  • Configure a service using Cloudera Manager

  • Create an HDFS user’s home directory

  • Configure NameNode HA

  • Configure ResourceManager HA

  • Configure proxy for Hiveserver2/Impala

Manage – Maintain and modify the cluster to support day-to-day operations in the enterprise

  • Rebalance the cluster

  • Set up alerting for excessive disk fill

  • Define and install a rack topology script

  • Install new type of I/O compression library in cluster

  • Revise YARN resource assignment based on user feedback

  • Commission/decommission a node

Secure – Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

  • Configure HDFS ACLs

  • Install and configure Sentry

  • Configure Hue user authorization and authentication

  • Enable/configure log and query redaction

  • Create encrypted zones in HDFS

Test – Benchmark the cluster operational metrics, test system configuration for operation and efficiency

  • Execute file system commands via HTTPFS

  • Efficiently copy data within a cluster/between clusters

  • Create/restore a snapshot of an HDFS directory

  • Get/set ACLs for a file or directory structure

  • Benchmark the cluster (I/O, CPU, network)

Troubleshoot – Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

  • Resolve errors/warnings in Cloudera Manager

  • Resolve performance problems/errors in cluster operation

  • Determine reason for application failure

  • Configure the Fair Scheduler to resolve application delays

Our Approach

  • You will start with creating Cloudera QuickStart VM (in case you have laptop with 16 GB RAM with Quad Core). This will facilitate you to get comfortable with Cloudera Manager.

  • You will be able to sign up for GCP and avail credit up to $300 while offer lasts. Credits are valid up to year.

  • You will then understand brief overview about GCP and provision 7 to 8 Virtual Machines using templates. You will also attaching external hard drive to configure for HDFS later.

  • Once servers are provisioned, you will go ahead and set up Ansible for Server Automation.

  • You will take care of local repository for Cloudera Manager and Cloudera Distribution of Hadoop using Packages.

  • You will then setup Cloudera Manager with custom database and then Cloudera Distribution of Hadoop using Wizard that comes as part of Cloudera Manager.

  • As part of setting up of Cloudera Distribution of Hadoop you will setup HDFS, learn HDFS Commands, Setup YARN, Configure HDFS and YARN High Availability, Understand about Schedulers, Setup Spark, Transition to Parcels, Setup Hive and Impala, Setup HBase and Kafka etc.

 

Who this course is for

  • System Administrators who want to understand Big Data eco system and setup clusters
  • Experienced Big Data Administrators who want to learn how to manage Hadoop and Spark Clusters setup using CDH
  • Entry level professionals who want to learn basics and Setup Big Data Clusters

 

Course content

  • Introduction – CCA 131 Cloudera Certified Hadoop and Spark Administrator
    • Introduction to the course
    • CCA 131 – Administrator – Official Page
    • Understanding required skills for the certification
    • Understanding the environment provided while taking the exam
    • Signing up for the exam
  • Getting Started – Provision instances from Google Cloud
    • Introduction
    • Setup Ubuntu using Windows Subsystem
    • Sign up for GCP
    • Create template for Big Data Server
    • Provision Servers for Big Data Cluster
    • Review Concepts
    • Setting up gcloud
    • Setup ansible on first server
    • Format JBOD
    • Cluster Topology
  • Getting Started – Setup local yum repository server – CDH
    • Introduction
    • Overview of yum
    • Setup httpd service
    • Setup local yum repository – Cloudera Manager
    • Setup local yum repository – Cloudera Distribution of Hadoop (CDH)
    • Copy repo files
  • Install CM and CDH – Setup CM, Install CDH and Setup Cloudera Management Service
    • Introduction
    • Setup Pre-requisites
    • Install Cloudera Manager
    • Licensing and Installation Options
    • Install CM and CDH on all nodes
    • CM Agents and CM Server
    • Setup Cloudera Management Service
    • Cloudera Management Service – Components
  • Install CM and CDH – Configure Zookeeper
    • Introduction
    • Learning Process
    • Setup Zookeeper
    • Review important properties
    • Zookeeper Concepts
    • Important Zookeeper Commands
  • Install CM and CDH – Configure HDFS and Understand Concepts
    • Introduction
    • Setup HDFS
    • Copy Data into HDFS
    • Copy Data into HDFS Contd
    • Components of HDFS
    • Components of HDFS Contd
    • Configuration files and Important Properties
    • Review Web UIs and log files
    • Checkpointing
    • Checkpointing Contd
    • Namenode Recovery Process
    • Configure Rack Awareness
  • Install CM and CDH – Important HDFS Commands
    • Introduction
    • Getting list of commands and help
    • Creating Directories and Changing Ownership
    • Managing Files and File Permissions – Deleting Files from HDFS
    • Managing Files and File Permissions – Copying Files Local File System and HDFS
    • Managing Files and File Permissions – Copying Files within HDFS
    • Managing Files and File Permissions – Previewing Data in HDFS
    • Managing Files and File Permissions – Changing File Permissions
    • Controlling Access using ACLs – Enable ACLs On Cluster
    • Controlling Access using ACLs – ACLs On Files
    • Controlling Access using ACLs – ACLs On Directories
    • Controlling Access using ACLs – Removing ACLs
    • Overriding Properties
    • HDFS usage commands and getting metadata
    • Creating Snapshots
    • Using CLI for administration
  • Install CM and CDH – Configure YARN + MRv2 and Understand Concepts
    • Introduction
    • Setup YARN + MR2
    • Run Simple Map Reduce Job
    • Components of YARN and MR2
    • Configuration files and Important Properties – Overview
    • Configuration files and Important Properties – Review YARN Properties
    • Configuration files and Important Properties – Review Map Reduce Properties
    • Configuration files and Important Properties – Running Jobs
    • Review Web UIs and log files
    • YARN and MR2 CLI
    • YARN Application Life Cycle
    • Map Reduce Job Execution Life Cycle
  • Install CM and CDH – Configuring HDFS and YARN HA
    • Introduction
    • High Availability – Overview
    • Configure HDFS Namenode HA
    • Review Properties – HDFS Namenode HA
    • HDFS Namenode HA – Quick Recap of HDFS typical Configuration
    • HDFS Namenode HA – Components
    • HDFS Namenode HA – Automatic failover
    • Configure YARN Resource Manager HA
    • Review – YARN Resource Manager HA
    • High Availability – Implications
  • Install CM and CDH – YARN Schedulers – FIFO, Fair, and Capacity
    • Introduction
    • Schedulers Overview
    • FIFO Scheduler
    • Introduction to Fair Scheduler
    • Configure Fair Scheduler – Configure Cluster with Fair Scheduler
    • Configure Fair Scheduler – Running Jobs Without Specifying Queue
    • Configure Fair Scheduler – Running Jobs Specifying Queue
    • Configure Fair Scheduler – Important Properties
    • Capacity Scheduler – Introduction
    • Capacity Scheduler – Configure using Cloudera Manager
    • Capacity Scheduler – Run Sample Jobs
  • Install Other Components – Spark Overview and Installation
    • Introduction
    • Setup and Validate Spark 1.6.x
    • Review Important Properties
    • Spark Execution Life Cycle
    • Convert Cluster to Parcels
    • Setup Spark 2.3.x
    • Run Spark Jobs – Spark 2.3.x
  • Install Other Components – Configuring Database Engines – Hive and Impala
    • Introduction
    • Setup Hive and Impala
    • Validating Hive and Impala
    • Components and Properties of Hive
    • Troubleshooting Hive Issues
    • Hive Commands and Queries
    • Different Query Engines
    • Components and Properties of Impala
    • Running Queries using Impala – Overview
  • Install Other Components – Configure Hadoop Ecosystem components
    • Introduction
    • Setup Oozie, Pig, Sqoop and Hue
    • Review Important Properties
    • Run Sample Oozie job
    • Run Pig Job
    • Validate Sqoop
    • Overview of Hue
  • Install Other Components – Install and Configure Kafka and HBase
    • Introduction
    • Kafka Overview
    • Setup Parcels and Add Kafka Service
    • Validate Kafka
    • Setting up HBase
    • Validate HBase
  • CCA 131 – Revision for the Exam – Install the Cluster
    • Introduction
    • Set up a local CDH Repository
    • Perform OS-level Configuration
    • Install Cloudera Manager Server and Agents
    • Install CDH using Cloudera Manager
    • Add a New Node to an Existing Cluster
    • Install – Add Host as Worker
    • Add a Service using Cloudera Manager
  • CCA 131 – Revision for the Exam – Configure the Cluster
    • Introduction
    • Configure a Service using Cloudera Manager
    • Create an HDFS user’s home directory
    • Configure NameNode HA
    • Configure ResourceManager HA
    • Configure proxy for HiveServer2/Impala – Install HA Proxy
    • Configure proxy for HiveServer2
    • Configure proxy for Impala
  • CCA 131 – Revision for the Exam – Manage the Cluster
    • Introduction
    • Rebalance the cluster
    • Set up alerting for excessive disk fill
    • Define and install a rack topology script
    • Add I/O Compression Library
    • YARN Resource Assignment
    • Commission/Decommission a node
  • CCA 131 – Revision for the Exam – Secure the Cluster
    • Introduction
    • Configure HDFS ACLs
    • Install and Configure Sentry
    • Configure Hue user authorization and authentication
    • Enable or Configure Log and Query Redaction
    • Create Encrypted Zones in HDFS – Enable Encryption
    • Create Encrypted Zones in HDFS – Create Encryption Keys and Zones
  • CCA 131 – Revision for the Exam – Test and Troubleshoot the Cluster
    • Introduction
    • Execute file system commands via HTTPFS
    • Efficiently copy data within a cluster
    • Efficiently copy data between clusters
    • Create/Restore a snapshot of an HDFS directory
    • Get/Set ACLs for a file or directory structure
    • Benchmark the cluster (I/O, CPU, network)
    • Resolve errors/warnings in Cloudera Manager
    • Resolve performance problems/errors in cluster operation

 

Time remaining or 180 enrolls left

 

Don’t miss any coupons by joining our Telegram group 

Udemy Coupon Code 100% off | Udemy Free Course | Udemy offer | Course with certificate