Discuss The Advantages And Disadvantages Of Distributed File System

HDFS-Hadoop Distributed File System
Big Data is a large collection of data set that can be structured or un-structured. As the size of data set is very large it’s very difficult to process it using traditional database management system and software techniques. As the data collection from different sources is becoming very large we require a method which can efficiently store data and process it. When the data set outgrows the storage capacity it is difficult to accommodate data on a single physical machine, so it’s become necessary to distribute data across multiple machines. The filesystem that can be used to manage data across network is distributed filesystem. So for this we require a filesystem that can manage large data set across network and also provide scalable
HDFS store these both separately. HDFS stores its metadata and edit log at namenode and application data on datanode. In traditional data management systems or filesystems when we store data and process it , we have to traverse whole data which is costly and time consuming and all data was stored on single physical machine, if machine crashes all the data will get lost. So overcome these limitations HDFS was designed. HDFS has the following advantages over traditional systems:
• Commodity Hardware: HDFS does not require expensive and highly reliable hardware because it can run on clusters of common available hardware that can be obtained from multiple vendors.
• Fault Tolerance: In HDFS there are thousands of servers working in parallel and each storing some part of data set. So every minute there is a chance that any of node or server get failed and losing data on that server. But HDFS is fault tolerance which means if any node fails data associated on that node get replicated to another nodes So if one node fails we can get its data which is replicated on another nodes can be retrieved. By default replication factor is

