Content deleted Content added
Line 80:
{{Cat main|Apache Hadoop}}
HDFS, Hadoop Distributed File System,hosted by Apache Software Foundation, is a distributed file system. It's designed to hold very large amounts of data (terabytes or even petabytes). It's architecture is similar to GFS one, i.e. a master/slave architecture.The HDFS is normally installed on a cluster of computers.
The design concept of Hadoop refers to Google, including Google File System, Google MapReduce and [[w:BigTable|BigTable]]. These three techniques are individually mapping to Hadoop and Distributed File System (HDFS), Hadoop MapReduce Hadoop Base (HBase)
An HDFS cluster consists of a single NameNode and several Datanode machines. A nameNode, a master server, manages and maintains the metadata of storage DataNodes in its RAM. DataNodes manage storage attached to the nodes that they run on.
The NameNode and Datanode are software programs designed to run on everyday use machines.These machines typically run on a GNU/Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software <ref> {{harvnb|Azzedin|p=2|id= Azzedin}}</ref>.
More explicitly, a file is split into one or more equal-size blocks except the last one that could smaller. Each block is stored in multiple DataNodes. Each block may be replicated on multiple DataNodes to guarantee a high availability. By default, each block is replicated three times and that process is called "Block Level Replication"
The NameNode manage the file system namespace operations like opening, closing, and renaming files and directories and regulates the file access. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for operating read and write requests from the file system’s clients, managing the block allocation or deletion, and replicating blocks.
|