Content deleted Content added
{{cat main}} --> {{main}} |
m Reverted edit by 154.133.103.135 (talk) to last version by JCW-CleanerBot |
||
(19 intermediate revisions by 11 users not shown) | |||
Line 22:
=== Client-server architecture ===
[[Network File System]] (NFS) uses a [[client-server architecture]], which allows sharing of files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual ___location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently.<ref>{{harvnb|Di Sano| Di Stefano|Morana|Zito|2012|p=2}}</ref> The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model:
* Remote access model: Provides transparency, the client has access to a file. He
* Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients.
Line 42:
===== Load balancing =====
[[Load balancing (computing)|Load balancing]] is essential for efficient operation in distributed environments. It means distributing work among different servers,<ref>{{harvnb|Kai|Dayang|Hui|Yintang|2013|p=23}}</ref> fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.<ref name="ReferenceA">{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=2}}</ref> In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications.
===== Load rebalancing =====
Line 49:
Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold.<ref>{{harvnb|Ghemawat|Gobioff|Leung|2003|p=8}}</ref> However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is [[w:NP-hard|NP-hard]].<ref>{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=953}}</ref>
In order to get a large number of chunkservers to work in collaboration, and to solve the problem of load balancing in distributed file systems, several approaches have been proposed, such as reallocating file chunks so that the chunks can be distributed as uniformly as possible while reducing the movement cost as much as possible.<ref name="ReferenceA" />
==== Google file system ====
Line 60:
The master server running in dedicated node is responsible for coordinating storage resources and managing files's [[metadata]] (the equivalent of, for example, inodes in classical file systems).<ref name="Krzyzanowski_p2">{{harvnb|Krzyzanowski|2012|p=2}}</ref>
Each file is split
The master maintains all of the files's metadata, including file names, directories, and the mapping of files to the list of chunks that contain each file's data. The metadata is kept in the master server's main memory, along with the mapping of files to chunks. Updates to this data are logged to an operation log on disk. This operation log is replicated onto remote machines. When the log
===== Fault tolerance =====
Line 84:
{{main|Apache Hadoop}}
{{abbr|HDFS |Hadoop Distributed File System}}, developed by the [[Apache Software Foundation]], is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes). Its architecture is similar to GFS, i.e. a
The design concept of Hadoop is informed by Google's, with Google File System, Google MapReduce and [[Bigtable]], being implemented by Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Hadoop Base (HBase) respectively.<ref>{{harvnb|Fan-Hsun|Chi-Yuan| Li-Der| Han-Chieh|2012|p=2}}</ref> Like GFS, HDFS is suited for scenarios with write-once-read-many file access, and supports file appends and truncates in lieu of random reads and writes to simplify data coherency issues.<ref>{{Cite web | url=http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Assumptions_and_Goals | title=Apache Hadoop 2.9.2 – HDFS Architecture}}</ref>
Line 100:
Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.<ref>{{harvnb|Soares| Dantas†|de Macedo|Bauer|2013|p=158}}</ref> Some examples include: [[MapR FS|MapR File System]] (MapR-FS), [[Ceph (storage)|Ceph-FS]], [[BeeGFS|Fraunhofer File System (BeeGFS)]], [[Lustre (file system)|Lustre File System]], [[IBM General Parallel File System]] (GPFS), and [[Parallel Virtual File System]].
MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.<ref name="mapr-productivity">{{cite web|last1=Perez|first1=Nicolas|title=How MapR improves our productivity and simplifies our design|url=https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.mvr6mmydr|website
Ceph-FS is a distributed file system that provides excellent performance and reliability.<ref>{{harvnb|Weil|Brandt|Miller|Long|2006|p=307}}</ref> It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.<ref>{{harvnb|Maltzahn|Molina-Estolano|Khurana|Nelson|2010|p=39}}</ref>
Line 185:
== Bibliography ==
* {{cite book
| last1 = Andrew
| first1 = S.Tanenbaum | last2 = Maarten
| first2 = Van Steen | year = 2006
| title = Distributed systems principles and paradigms
| url = http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf
| access-date = 2014-01-10
| archive-date = 2013-08-20
| archive-url = https://web.archive.org/web/20130820190519/http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf
| url-status = dead
}}
* {{cite web
| first = Fabio |last = Kon
Line 210 ⟶ 216:
* {{cite web
| last1 = Jacobi
| first1 = Tim-Daniel
| last2 = Lingemann
| first2 = Jan
| url = http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf
| title = Evaluation of Distributed File Systems
| access-date = 2014-01-24
| archive-date = 2014-02-03
| archive-url = https://web.archive.org/web/20140203140412/http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf
| url-status = dead
}}
# Architecture, structure, and design:
#* {{cite book
Line 230 ⟶ 240:
| year = 2012
| doi = 10.1109/ClusterW.2012.27
| s2cid = 12430485
| chapter = A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P
Line 241 ⟶ 250:
| year = 2013
| doi = 10.1109/CTS.2013.6567222
| s2cid = 45293053
| pages = 155–161
Line 253 ⟶ 261:
| year = 2012
| url = http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf
| access-date = 2013-12-27
| archive-date = 2013-12-27
| archive-url = https://web.archive.org/web/20131227152320/http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf
| url-status = dead
}}
#* {{cite conference
| last1 = Kobayashi | first1 = K
Line 262 ⟶ 274:
| title = The Gfarm File System on Compute Clouds
| conference = Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
| conference-url =
| doi = 10.1109/IPDPS.2011.255
}}
#* {{cite book
Line 272 ⟶ 283:
| year = 2012
| doi = 10.1109/ICAICT.2012.6398489
| s2cid = 6113112
| pages = 1–5
Line 288 ⟶ 298:
| first4 =Yu-Chang
| title = Load Rebalancing for Distributed File Systems in Clouds
|
| year = 2013
| doi = 10.1109/TPDS.2012.196
| s2cid = 11271386
| pages = 951–962
Line 309 ⟶ 318:
| year = 2013
| doi = 10.1109/INCoS.2013.14
| s2cid = 14821266
| pages = 23–29
Line 331 ⟶ 339:
| year = 2008
| doi = 10.1109/NCM.2008.164
| s2cid = 18933772
| pages = 400–405
Line 349 ⟶ 356:
| year = 2013
| doi = 10.1109/WETICE.2013.12
| s2cid = 6155753
| pages = 158–163
Line 361 ⟶ 367:
| year = 2012
| doi = 10.1109/ICAICT.2012.6398484
| s2cid = 16674289
| pages = 1–3
| chapter = Distributed file system as a basis of data-intensive computing
Line 373 ⟶ 378:
| year = 2003
| url = https://www.kernel.org/doc/ols/2003/ols2003-pages-380-386.pdf
| pages = 400–407
}}
#* {{cite journal
|
|first1 = Terry
|
|first2 = Alice
|last3 = Yates
|first3 = R. Kim
|
|
|
|year = 2000
|access-date = 2014-01-24
|archive-date = 2013-02-26
|archive-url = https://web.archive.org/web/20130226053255/https://computing.llnl.gov/code/sio/GPFS_performance.pdf
|url-status = dead
}}
#* {{cite
|
|
|
|
|
|
|
|
|
|
|
|conference = Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06)
|access-date = 2014-01-24
|archive-date = 2012-03-09
|archive-url = https://web.archive.org/web/20120309021423/http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf
|url-status = dead
}}
#* {{cite
| last1 = Maltzahn
| first1 = Carlos
Line 413 ⟶ 424:
| first4= Alex J.
| last5 = Brandt
| first5= Scott A.
| last6=Weil
| first6=Sage
| title =Ceph as a scalable alternative to the Hadoop Distributed FileSystem
| year = 2010
Line 432 ⟶ 443:
| year = 2003
| doi = 10.1109/MASS.2003.1194865
| pages = 290–298
| chapter = Efficient metadata management in large distributed storage systems
| isbn = 978-0-7695-1914-2
Line 489 ⟶ 499:
| year = 2011
| doi = 10.1109/SWS.2011.6101263
| s2cid = 14791637
| pages = 16–20
| chapter = A carrier-grade service-oriented file storage architecture for cloud computing
Line 509 ⟶ 518:
| isbn = 978-1-58113-757-6
| s2cid =221261373
}}
# Security
Line 522 ⟶ 530:
| year = 2009
| doi = 10.1109/I-SPAN.2009.150
| pages = 4–16
| chapter = High-Performance Cloud Computing: A View of Scientific Applications
| isbn = 978-1-4244-5403-7
Line 565 ⟶ 572:
| year = 2012
| doi = 10.1109/MIC.2012.6273264
| s2cid = 40685246
| pages = 327–331
| chapter = PsFS: A high-throughput parallel file system for secure Cloud Storage system
Line 580 ⟶ 586:
| last4 = Xue
| first4 = Lan
| title = Efficient Metadata Management in Large Distributed Storage Systems
| periodical = 11th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA
| year = 2003
| url = http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf
| access-date = 2013-12-27
| archive-date = 2013-08-22
| archive-url = https://web.archive.org/web/20130822213717/http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf
| url-status = dead
}}
#* {{cite journal
| author = Lori M. Kaufman
| s2cid = 16233643
| title =Data Security in the World of Cloud Computing
|
| year = 2009
| doi = 10.1109/MSP.2009.87
Line 604 ⟶ 613:
| last3 = Oprea
| first3 =Alina
| title = Proceedings of the 16th ACM conference on Computer and communications security
| chapter = HAIL: A high-availability and integrity layer for cloud storage
| s2cid = 207176701
| year = 2009
| doi = 10.1145/1653662.1653686
Line 635 ⟶ 644:
| year = 2012
| doi = 10.1109/Grid.2012.17
| s2cid = 10778240
| pages = 12–21
| chapter = A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
Line 655 ⟶ 663:
| year = 2012
| doi = 10.1109/SC.Companion.2012.103
| s2cid = 5554936
| pages = 753–759
| chapter = Integrating High Performance File Systems in a Cloud Computing Environment
Line 673 ⟶ 680:
| year = 2012
| doi = 10.1109/ISPACS.2012.6473485
| s2cid = 18260943
| pages = 227–232
| chapter = Implement a reliable and secure cloud distributed file system
Line 691 ⟶ 697:
| year = 2012
| doi = 10.1109/WETICE.2012.104
| s2cid = 19798809
| pages = 173–178
| chapter = File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds
Line 718 ⟶ 723:
| year = 2008
| url = http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf
| access-date = 2013-12-27
| archive-date = 2013-07-12
| archive-url = https://web.archive.org/web/20130712182757/http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf
| url-status = dead
}}
#* {{cite journal
| last1 = Yau
Line 739 ⟶ 748:
| last4 = Gibson
| first4 = Garth
| title = Proceedings of the 4th Annual Workshop on Petascale Data Storage
| chapter = DiskReduce: RAID for data-intensive scalable computing
| s2cid = 15194567
| year = 2009
| doi = 10.1145/1713072.1713075
| pages = 6–10
| isbn = 978-1-60558-883-4
}}
#* {{cite book
Line 771 ⟶ 780:
| last3 = Weatherspoon
| first3 = Hakim
| title = Proceedings of the 1st ACM symposium on Cloud computing
| chapter = RACS: A case for cloud storage diversity
| s2cid = 1283873
| year = 2010
| doi = 10.1145/1807128.1807165
Line 821 ⟶ 830:
| isbn = 978-1-4577-1904-2
}}
#* {{cite
| last1 = Qian
| first1 = Haiyang
Line 828 ⟶ 837:
| last3 = T.
| first3 = Trivedi
| title = 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops
| chapter = A hierarchical model to evaluate quality of experience of online services hosted by cloud computing
| year = 2011
| doi = 10.1109/INM.2011.5990680
| pages = 105–112
| isbn = 978-1-4244-9219-0
| citeseerx = 10.1.1.190.5148
| s2cid = 15912111
}}
Line 858 ⟶ 869:
| chapter = Provable data possession at untrusted stores
| isbn = 978-1-59593-703-2
| url = https://figshare.com/articles/journal_contribution/6469184
}}
#* {{cite book
Line 899 ⟶ 911:
| last2 = S. Kaliski
| first2 = Burton
| title = Proceedings of the 14th ACM conference on Computer and communications security
| chapter = Pors: Proofs of retrievability for large files
| s2cid = 6032317
| year = 2007
| doi = 10.1145/1315245.1315317
Line 937 ⟶ 949:
| journal=Proceedings of the VLDB Endowment | volume = 2 |issue= 1|doi=10.14778/1687627.1687657
}}
#* {{cite
| last1 = Daniel
| first1 = J. Abadi
Line 982 ⟶ 994:
| chapter = Provable data possession at untrusted stores
| isbn = 978-1-59593-703-2
| url = https://figshare.com/articles/journal_contribution/6469184
}}
# Synchronization
Line 995 ⟶ 1,008:
| doi = 10.1109/CLUSTERWKSP.2010.5613087
| pages = 1–4
| s2cid = 14577793
| chapter = Cloud-based synchronization of distributed file system hierarchies
Line 1,006 ⟶ 1,018:
| s2cid = 16233643
| title = Data Security in the World of Cloud Computing
|
| year = 2009
| doi = 10.1109/MSP.2009.87
Line 1,042 ⟶ 1,054:
| year = 2011
| doi = 10.1109/3PGCIC.2011.37
| s2cid = 13393620
| pages =193–199
| chapter = Suitability of Cloud Computing for Scientific Data Analyzing Applications; an Empirical Study
Line 1,052 ⟶ 1,063:
[[Category:Cloud storage]]
[[Category:Cloud computing]]
|