Distributed file system for cloud: Difference between revisions

Content deleted Content added
Bender the Bot (talk | contribs)
m Bibliography: HTTP → HTTPS for Carnegie Mellon CS, replaced: http://www.cs.cmu.edu/ → https://www.cs.cmu.edu/
m Minor Clean Up and Fixes, typo(s) fixed: ’s → 's (2)
Line 65:
Each file is split to multiple chunks of 64 megabytes. Each chunk is stored in a chunk server. A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created.
 
The master maintains all of the files's metadata, including file names, directories, and the mapping of files to the list of chunks that contain each file’sfile's data. The metadata is kept in the master server's main memory, along with the mapping of files to chunks. Updates to this data are logged to an operation log on disk. This operation log is replicated onto remote machines. When the log become too large, a checkpoint is made and the main-memory data is stored in a [[B-tree]] structure to facilitate mapping back into main memory.<ref>{{harvnb|Krzyzanowski|2012|p=4}}</ref>
 
===== Fault tolerance =====
Line 94:
On an HDFS cluster, a file is split into one or more equal-size blocks, except for the possibility of the last block being smaller. Each block is stored on multiple DataNodes, and each may be replicated on multiple DataNodes to guarantee availability. By default, each block is replicated three times, a process called "Block Level Replication".<ref name="admaov_2">{{harvnb|Adamov|2012|p=2}}</ref>
 
The NameNode manages the file system namespace operations such as opening, closing, and renaming files and directories, and regulates file access. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for servicing read and write requests from the file system’ssystem's clients, managing the block allocation or deletion, and replicating blocks.<ref>{{harvnb|Yee|Thu Naing|2011|p=122}}</ref>
 
When a client wants to read or write data, it contacts the NameNode and the NameNode checks where the data should be read from or written to. After that, the client has the ___location of the DataNode and can send read or write requests to it.