Hadoop Cluster Analysis

3437 Words14 Pages

Lesson 2: Planning your Hadoop Cluster
Slide 1: Introduction
Hello and welcome to the second lesson of the Big Data Hadoop – Administrator Course offered by Simplilearn. This lesson focuses on Planning your Hadoop Cluster.
Let us look at the objectives of this lesson in the next screen.
Slide 2: Objectives
After completing this lesson, you will be able to describe Hadoop cluster, explain planning for Hadoop cluster, and discuss hardware and other network configuration. You will also be able to explain Cluster Management.
Let us start with the first topic in the following screen.
Slide 3: Overview of Hadoop cluster
A Hadoop cluster is a particular case of computational cluster designed specifically for storing and analysing vast quantities of …show more content…

It selects the first Data Node in the list for Block A (Data Node 1), opens a TCP 50010 connection and says, “Hey, get ready to receive a block, and here’s a list of (2) Data Nodes, Data Node 5 and Data Node 6. Go make sure they’re ready to receive this block too.” Data Node 1 then opens a TCP connection to Data Node 5 and says, “Hey, get ready to receive a block, and go make sure Data Node 6 is ready to receive this block too.” Data Node 5 will then ask Data Node 6, “Hey, are you ready to receive a block?”
In the next screen, we will understand Pipelined HDFS Write in the next screen.
Slide 9: Pipelined HDFS Write
As data for each block is written into the cluster a replication pipeline is created between the (3) Data Nodes (or however many you have configured in dfs.replication). This suggests that when a Data Node is getting block data it simultaneously pushes a copy of that data to the next Node in the pipeline.
It is a primary example of using the Rack Awareness data in the Name Node to enhance cluster performance. Here the second and third Data Nodes in the pipeline are in the same rack, and hence the last leg of the pipeline does not have to navigate between racks, rather it benefits from in-rack bandwidth and low latency. The next block will not initiate until this block is effectively written to all …show more content…

Out of the box many commands are provided by Hadoop to manage the clusters, some of the important ones are listed below:
● Hadoop Dfsadmin –report Check status of slave nodes.
● Hadoop Dfsadmin –refreshNodes Refresh data nodes
● Hadoop Dfsadmin -metasave meta.log Save meta data of name node.
Slide 20: Quiz
Following is the quiz section to check your understanding of this lesson.
Slide 31: Summary
Let us summarize what we have learned in this lesson:
Different machine or server roles involved in Hadoop deployment along with typical architecture of Hadoop Cluster.
Workflow of Hadoop Cluster: how the data is written, analyzed, stored and read in Hadoop Cluster.
Detailed knowledge about HDFS Writes, Rack Awareness, Pipelined Writes, Name Node, Missing Replicas, Unbalanced Cluster and Balanced Cluster.
Factors that are important while planning Hadoop Cluster.
Recommended Hardware and Network Configurations for Master and Slave.
Standard Network Topology Architecture.
Slide 32: Conclusion
This concludes the lesson on Planning your Hadoop Cluster. The next lesson will focus on Hadoop Installation and

More about Hadoop Cluster Analysis

Open Document