Revision as of 16:14, 22 March 2023 edit 173.76.103.7 (talk) →Hadoop distributed file system ← Previous edit		Revision as of 11:11, 7 July 2023 edit undo Jerry Biggle (talk \| contribs) 390 edits No edit summary Tag: Visual edit Next edit →
Line 22: === Client-server architecture === [[Network File System]] (NFS) uses a [[client-server architecture]], which allows sharing of files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual ___location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently.<ref>{{harvnb\|Di Sano\| Di Stefano\|Morana\|Zito\|2012\|p=2}}</ref> The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model: * Remote access model: Provides transparency, the client has access to a file. He ~~send~~sends requests to the remote file (while the file remains on the server).<ref>{{harvnb\|Andrew\|Maarten\|2006\|p=492}}</ref> * Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients. Line 49: Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold.<ref>{{harvnb\|Ghemawat\|Gobioff\|Leung\|2003\|p=8}}</ref> However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is [[w:NP-hard\|NP-hard]].<ref>{{harvnb\|Hsiao\|Chung\|Shen\|Chao\|2013\|p=953}}</ref> In order to get a large number of chunkservers to work in collaboration, and to solve the problem of load balancing in distributed file systems, several approaches have been proposed, such as reallocating file chunks so that the chunks can be distributed as uniformly as possible while reducing the movement cost as much as possible.<ref name="ReferenceA" /> ==== Google file system ==== Line 60: The master server running in dedicated node is responsible for coordinating storage resources and managing files's [[metadata]] (the equivalent of, for example, inodes in classical file systems).<ref name="Krzyzanowski_p2">{{harvnb\|Krzyzanowski\|2012\|p=2}}</ref> Each file is split tointo multiple chunks of 64 megabytes. Each chunk is stored in a chunk server. A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created. The master maintains all of the files's metadata, including file names, directories, and the mapping of files to the list of chunks that contain each file's data. The metadata is kept in the master server's main memory, along with the mapping of files to chunks. Updates to this data are logged to an operation log on disk. This operation log is replicated onto remote machines. When the log ~~become~~becomes too large, a checkpoint is made and the main-memory data is stored in a [[B-tree]] structure to facilitate mapping back into the main memory.<ref>{{harvnb\|Krzyzanowski\|2012\|p=4}}</ref> ===== Fault tolerance =====

Distributed file system for cloud: Difference between revisions