Content deleted Content added
Citation bot (talk | contribs) Alter: url. URLs might have been anonymized. | Use this bot. Report bugs. | Suggested by Corvus florensis | #UCB_webform 554/3499 |
{{cat main}} --> {{main}}, cleanup |
||
Line 7:
== Overview ==
=== History ===
Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s. Sun Microsystem's [[Network File System]] became available in the 1980s. Before that, people who wanted to share files used the [[sneakernet]] method, physically transporting files on storage media from place to place. Once computer networks started to proliferate, it became obvious that the existing file systems had many limitations and were unsuitable for multi-user environments. Users initially used [[FTP]] to share files.<ref>{{harvnb|Sun microsystem|p=1}}</ref> FTP first ran on the [[PDP-10]] at the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and then from the server onto the destination computer. Users were required to know the physical addresses of all computers involved with the file sharing.<ref>{{harvnb|Kon|1996|p=1}}</ref>
Line 33 ⟶ 32:
==== Design principles ====
===== Goals =====
[[Google File System]] (GFS) and [[Hadoop Distributed File System]] (HDFS) are specifically built for handling [[batch processing]] on very large data sets.
Line 44 ⟶ 42:
===== Load balancing =====
Load balancing is essential for efficient operation in distributed environments. It means distributing work among different servers,<ref>{{harvnb|Kai|Dayang|Hui|Yintang|2013|p=23}}</ref> fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.<ref name="ReferenceA">{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=2}}</ref> In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications.
===== Load rebalancing =====
In a cloud computing environment, failure is the norm,<ref>{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=952}}</ref><ref>{{harvnb|Ghemawat|Gobioff|Leung|2003|p=1}}</ref> and chunkservers may be upgraded, replaced, and added to the system. Files can also be dynamically created, deleted, and appended. That leads to load imbalance in a distributed file system, meaning that the file chunks are not distributed equitably between the servers.
Line 56 ⟶ 52:
==== Google file system ====
{{
===== Description =====
|