Distributed file system for cloud: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:41, 9 January 2019 edit Girth Summit (talk \| contribs) Autopatrolled, Checkusers, Administrators 102,530 edits Reverted good faith edits by JMCF EL (talk): Revert addition of external link. (TW) Tag: Undo ← Previous edit		Latest revision as of 13:38, 29 July 2025 edit undo Lynch44 (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers, Temporary account IP viewers 15,669 edits m Reverted edit by 154.133.103.135 (talk) to last version by JCW-CleanerBot Tags: Rollback Mobile edit Mobile web edit
(48 intermediate revisions by 21 users not shown)
Line 1: {{Short description\|File system that allows many clients to have access}} A '''distributed file system for cloud''' is a [[w:file system\|file system]] that allows many clients to have access to data and supports operations (create, delete, modify, read, write) on that data. Each data file may be partitioned into several parts called [[Chunk (information)\|chunks]]. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a [[Hierarchical tree structure\|hierarchical tree]], where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. [[w:Confidentiality\|Confidentiality]], [[w:Availability\|availability]] and [[w:Integrity\|integrity]] are the main keys for a secure system. Line 6 ⟶ 7: == Overview == === History === Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s. Sun Microsystem's [[Network File System]] became available in the 1980s. Before that, people who wanted to share files used the [[sneakernet]] method, physically transporting files on storage media from place to place. Once computer networks started to proliferate, it became obvious that the existing file systems had many limitations and were unsuitable for multi-user environments. Users initially used [[FTP]] to share files.<ref>{{harvnb\|Sun microsystem\|p=1}}</ref> FTP first ran on the [[PDP-10]] at the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and then from the server onto the destination computer. Users were required to know the physical addresses of all computers involved with the file sharing.<ref>{{harvnb\|~~Fabio~~ Kon\|1996\|p=1}}</ref> === Supporting techniques === Line 22: === Client-server architecture === [[Network File System]] (NFS) uses a [[client-server architecture]], which allows sharing of files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual ___location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently.<ref>{{harvnb\|Di Sano\| Di Stefano\|Morana\|Zito\|2012\|p=2}}</ref> The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model: * Remote access model: Provides transparency, the client has access to a file. He ~~send~~sends requests to the remote file (while the file remains on the server).<ref>{{harvnb\|Andrew\|Maarten\|2006\|p=492}}</ref> * Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients. Line 32: ==== Design principles ==== ===== Goals ===== [[Google File System]] (GFS) and [[Hadoop Distributed File System]] (HDFS) are specifically built for handling [[batch processing]] on very large data sets. Line 43 ⟶ 42: ===== Load balancing ===== [[Load balancing (computing)\|Load balancing]] is essential for efficient operation in distributed environments. It means distributing work among different servers,<ref>{{harvnb\|Kai\|Dayang\|Hui\|Yintang\|2013\|p=23}}</ref> fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.<ref name="ReferenceA">{{harvnb\|Hsiao\|Chung\|Shen\|Chao\|2013\|p=2}}</ref> In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications. Load balancing is essential for efficient operation in distributed environments. It means distributing work among different servers,<ref>{{harvnb\|Kai\|Dayang\|Hui\|Yintang\|2013\|p=23}}</ref> fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.<ref name="ReferenceA">{{harvnb\|Hsiao\|Chung\|Shen\|Chao\|2013\|p=2}}</ref> In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications. ===== Load rebalancing ===== In a cloud computing environment, failure is the norm,<ref>{{harvnb\|Hsiao\|Chung\|Shen\|Chao\|2013\|p=952}}</ref><ref>{{harvnb\|Ghemawat\|Gobioff\|Leung\|2003\|p=1}}</ref> and chunkservers may be upgraded, replaced, and added to the system. Files can also be dynamically created, deleted, and appended. That leads to load imbalance in a distributed file system, meaning that the file chunks are not distributed equitably between the servers. Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold.<ref>{{harvnb\|Ghemawat\|Gobioff\|Leung\|2003\|p=8}}</ref> However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is [[w:NP-hard\|NP-hard]].<ref>{{harvnb\|Hsiao\|Chung\|Shen\|Chao\|2013\|p=953}}</ref> In order to get a large number of chunkservers to work in collaboration, and to solve the problem of load balancing in distributed file systems, several approaches have been proposed, such as reallocating file chunks so that the chunks can be distributed as uniformly as possible while reducing the movement cost as much as possible.<ref name="ReferenceA" /> ==== Google file system ==== {{~~Cat~~ main\|Google File System}} ===== Description ===== Line 63 ⟶ 60: The master server running in dedicated node is responsible for coordinating storage resources and managing files's [[metadata]] (the equivalent of, for example, inodes in classical file systems).<ref name="Krzyzanowski_p2">{{harvnb\|Krzyzanowski\|2012\|p=2}}</ref> Each file is split tointo multiple chunks of 64 megabytes. Each chunk is stored in a chunk server. A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created. The master maintains all of the files's metadata, including file names, directories, and the mapping of files to the list of chunks that contain each ~~file’s~~file's data. The metadata is kept in the master server's main memory, along with the mapping of files to chunks. Updates to this data are logged to an operation log on disk. This operation log is replicated onto remote machines. When the log ~~become~~becomes too large, a checkpoint is made and the main-memory data is stored in a [[B-tree]] structure to facilitate mapping back into the main memory.<ref>{{harvnb\|Krzyzanowski\|2012\|p=4}}</ref> ===== Fault tolerance ===== Line 82 ⟶ 79: Note that when the master assigns the write operation to a replica, it increments the chunk version number and informs all of the replicas containing that chunk of the new version number. Chunk version numbers allow for update error-detection, if a replica wasn't updated because its chunk server was down.<ref>{{harvnb\|Krzyzanowski\|2012\|p=5}}</ref> Some new Google applications did not work well with the 64-megabyte chunk size. To solve that problem, GFS started, in 2004, to implement the [[Bigtable]] approach.<ref>[{{Cite web \| url=https://arstechnica.com/business/2012/01/the-big-disk-drive-in-the-sky-how-the-giants-of-the-web-store-big-data/] \| title=The Great Disk Drive in the Sky: How Web giants store big—and we mean big—data\| date=2012-01-27}}</ref> ==== Hadoop distributed file system ==== {{~~Cat~~ main\|Apache Hadoop}} {{abbr\|HDFS \|Hadoop Distributed File System}}, developed by the [[Apache Software Foundation]], is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes). Its architecture is similar to GFS, i.e. a ~~master~~server/~~slave~~client architecture. The HDFS is normally installed on a cluster of computers. The design concept of Hadoop is informed by Google's, with Google File System, Google MapReduce and [[Bigtable]], being implemented by Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Hadoop Base (HBase) respectively.<ref>{{harvnb\|Fan-Hsun\|Chi-Yuan\| Li-Der\| Han-Chieh\|2012\|p=2}}</ref> Like GFS, HDFS is suited for scenarios with write-once-read-many file access, and supports file appends and truncates in lieu of random reads and writes to simplify data coherency issues.<ref>{{Cite web \| url=http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Assumptions_and_Goals \| title=Apache Hadoop 2.9.2 – HDFS Architecture}}</ref> An HDFS cluster consists of a single NameNode and several DataNode machines. The NameNode, a master server, manages and maintains the metadata of storage DataNodes in its RAM. DataNodes manage storage attached to the nodes that they run on. NameNode and DataNode are software designed to run on everyday-use machines, which typically run under a ~~GNU/~~Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software.<ref>{{harvnb\|Azzedin\|2013\|p=2}}</ref> On an HDFS cluster, a file is split into one or more equal-size blocks, except for the possibility of the last block being smaller. Each block is stored on multiple DataNodes, and each may be replicated on multiple DataNodes to guarantee availability. By default, each block is replicated three times, a process called "Block Level Replication".<ref name="admaov_2">{{harvnb\|Adamov\|2012\|p=2}}</ref> The NameNode manages the file system namespace operations such as opening, closing, and renaming files and directories, and regulates file access. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for servicing read and write requests from the file ~~system’s~~system's clients, managing the block allocation or deletion, and replicating blocks.<ref>{{harvnb\|Yee\|Thu Naing\|2011\|p=122}}</ref> When a client wants to read or write data, it contacts the NameNode and the NameNode checks where the data should be read from or written to. After that, the client has the ___location of the DataNode and can send read or write requests to it. Line 103 ⟶ 100: Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.<ref>{{harvnb\|Soares\| Dantas†\|de Macedo\|Bauer\|2013\|p=158}}</ref> Some examples include: [[MapR FS\|MapR File System]] (MapR-FS), [[Ceph (storage)\|Ceph-FS]], [[BeeGFS\|Fraunhofer File System (BeeGFS)]], [[Lustre (file system)\|Lustre File System]], [[IBM General Parallel File System]] (GPFS), and [[Parallel Virtual File System]]. MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.<ref name="mapr-productivity">{{cite web\|last1=Perez\|first1=Nicolas\|title=How MapR improves our productivity and simplifies our design\|url=https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.mvr6mmydr\|website=Medium\|~~publisher=Medium\|accessdate~~access-date=June 21, 2016\|date=2016-01-02}}</ref><ref>{{cite web\|last1=Woodie\|first1=Alex\|title=From Hadoop to Zeta: Inside ~~MapR’s~~MapR's Convergence Conversion\|url=http://www.datanami.com/2016/03/08/from-hadoop-to-zeta-inside-maprs-convergence-conversion/\|website=Datanami\|publisher=Tabor Communications Inc.\|~~accessdate~~access-date=June 21, 2016\|date=2016-03-08}}</ref><ref>{{cite web\|last1=Brennan\|first1=Bob\|title=Flash Memory Summit\|url=https://www.youtube.com/watch?v=fOT63zR7PvU&t=1682\|website=youtube\|publisher=Samsung\|~~accessdate~~access-date=June 21, 2016}}</ref><ref name="maprfs-video">{{cite web\|last1=Srivas\|first1=MC\|title=MapR File System\|url=https://www.youtube.com/watch?v=fP4HnvZmpZI\|website=Hadoop Summit 2011\|date=23 July 2011 \|publisher=Hortonworks\|~~accessdate~~access-date=June 21, 2016}}</ref><ref name="real-world-hadoop">{{cite book\|last1=Dunning\|first1=Ted\|last2=Friedman\|first2=Ellen\|title=Real World Hadoop\|date=January 2015\|publisher=O'Reilly Media, Inc\|___location=Sebastopol, CA\|isbn=978-1-4919-2395-5\|pages=23–28\|edition=First\|chapter-url=http://shop.oreilly.com/product/0636920038450.do\|~~accessdate~~access-date=June 21, 2016\|language=~~English~~en\|chapter=Chapter 3: Understanding the MapR Distribution for Apache Hadoop}}</ref> Ceph-FS is a distributed file system that provides excellent performance and reliability.<ref>{{harvnb\|Weil\|Brandt\|Miller\|Long\|2006\|p=307}}</ref> It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.<ref>{{harvnb\|Maltzahn\|Molina-Estolano\|Khurana\|Nelson\|2010\|p=39}}</ref> Line 114 ⟶ 111: High performance of distributed file systems requires efficient communication between computing nodes and fast access to the storage systems. Operations such as open, close, read, write, send, and receive need to be fast, to ensure that performance. For example, each read or write request accesses disk storage, which introduces seek, rotational, and network latencies.<ref>{{harvnb\|Upadhyaya\|Azimov\|Doan\|Choi\|2008\|p=400}}</ref> The data communication (send/receive) operations transfer data from the application buffer to the machine kernel, [[Transmission Control Protocol\|TCP]] controlling the process and being implemented in the kernel. However, in case of network congestion or errors, TCP may not send the data directly. While transferring data from a buffer in the [[~~Kernel~~kernel (~~computing~~operating system)\|kernel]] to the application, the machine does not read the byte stream from the remote machine. In fact, TCP is responsible for buffering the data for the application.<ref>{{harvnb\|Upadhyaya\|Azimov\|Doan\|Choi\|2008\|p=403}}</ref> Choosing the buffer-size, for file reading and writing, or file sending and receiving, is done at the application level. The buffer is maintained using a [[Linked list\|circular linked list]].<ref>{{harvnb\|Upadhyaya\|Azimov\|Doan\|Choi\|2008\|p=401}}</ref> It consists of a set of BufferNodes. Each BufferNode has a DataField. The DataField contains the data and a pointer called NextBufferNode that points to the next BufferNode. To find the current position, two [[Pointer (computer programming)\|pointers]] are used: CurrentBufferNode and EndBufferNode, that represent the position in the BufferNode for the last write and read positions. Line 188 ⟶ 185: == Bibliography == * {{cite book \| ~~ref~~last1 = ~~harv~~Andrew ~~\| last1 = Andrew~~ \| first1 = S.Tanenbaum \| last2 = Maarten \| first2 = Van Steen \| year = 2006 \| title = Distributed systems principles and paradigms \| url = http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf \| access-date = 2014-01-10 }} \| archive-date = 2013-08-20 \| archive-url = https://web.archive.org/web/20130820190519/http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf \| url-status = dead }} * {{cite web \| first = Fabio \|last = Kon ~~\| ref=harv~~ \| title = Distributed File Systems Past, Present and Future: A Distributed File System for 2006 ~~\| author = Fabio Kon~~ \| url = https://www.researchgate.net/publication/2439179 ~~\| url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.4609~~ \| year = 1996 ~~\| title =Distributed File Systems, The State of the Art and concept of Ph.D. Thesis~~ \| website = [[ResearchGate]] }} }} * {{cite web ~~\| ref=harv~~ \| author = Pavel Bžoch \| url = http://www.kiv.zcu.cz/site/documents/verejne/vyzkum/publikace/technicke-zpravy/2012/tr-2012-02.pdf Line 208 ⟶ 210: }} * {{cite web ~~\| ref=harv~~ \| author = Sun microsystem \| url = http://www.cse.chalmers.se/~tsigas/Courses/DCDSeminar/Files/afs_report.pdf Line 214 ⟶ 215: }} * {{cite web ~~\| ref=harv~~ \| last1 = Jacobi \| first1 = Tim-Daniel \| last2 = Lingemann \| first2 = Jan \| url = http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf \| title = Evaluation of Distributed File Systems \| access-date = 2014-01-24 }} \| archive-date = 2014-02-03 \| archive-url = https://web.archive.org/web/20140203140412/http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf \| url-status = dead }} # Architecture, structure, and design: #* {{cite book Line 236 ⟶ 240: \| year = 2012 \| doi = 10.1109/ClusterW.2012.27 \| s2cid = 12430485 ~~\| others = Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China~~ ~~\| ref=harv~~ \| chapter = A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P \| isbn = 978-0-7695-4844-9 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Azzedin \| first1 =Farag Line 248 ⟶ 250: \| year = 2013 \| doi = 10.1109/CTS.2013.6567222 \| s2cid = 45293053 ~~\| others = Information and Computer Science Department King Fahd University of Petroleum and Minerals~~ \| pages = 155–161 \| chapter = Towards a scalable HDFS architecture Line 254 ⟶ 256: }} #* {{Cite web ~~\| ref=harv~~ \| last1 = Krzyzanowski \| first1 = Paul Line 260 ⟶ 261: \| year = 2012 \| url = http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf \| access-date = 2013-12-27 }} \| archive-date = 2013-12-27 \| archive-url = https://web.archive.org/web/20131227152320/http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf \| url-status = dead }} #* {{cite conference ~~\| ref=harv~~ \| last1 = Kobayashi \| first1 = K \| last2 = Mikami\| first2 = S Line 270 ⟶ 274: \| title = The Gfarm File System on Compute Clouds \| conference = Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on \| ~~conferenceurl~~conference-url = ~~http~~https://ieeexplore.ieee.org/xpl/~~mostRecentIssue.jsp?punumber=~~conhome/6008655/proceeding \| doi = 10.1109/IPDPS.2011.255 ~~\| others = Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan~~ }} #* {{cite book ~~\| ref=harv~~ \| last1 = Humbetov \| first1 = Shamil Line 281 ⟶ 283: \| year = 2012 \| doi = 10.1109/ICAICT.2012.6398489 \| s2cid = 6113112 ~~\| others = Department of Computer Engineering Qafqaz University Baku, Azerbaijan~~ \| pages = 1–5 \| chapter = Data-intensive computing with map-reduce and hadoop Line 287 ⟶ 289: }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Hsiao \| first1 =Hung-Chang Line 297 ⟶ 298: \| first4 =Yu-Chang \| title = Load Rebalancing for Distributed File Systems in Clouds \| ~~periodical~~journal = IEEE Transactions on Parallel and Distributed Systems~~, IEEE Transactions on~~ \| year = 2013 \| doi = 10.1109/TPDS.2012.196 \| s2cid = 11271386 ~~\| others = National Cheng Kung University, Tainan~~ \| pages = 951–962 \| volume=24 Line 306 ⟶ 307: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Kai \| first1 = Fan Line 318: \| year = 2013 \| doi = 10.1109/INCoS.2013.14 \| s2cid = 14821266 ~~\| others = State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an, China~~ \| pages = 23–29 \| chapter = An Adaptive Feedback Load Balancing Algorithm in HDFS Line 324: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Upadhyaya \| first1 = B Line 340 ⟶ 339: \| year = 2008 \| doi = 10.1109/NCM.2008.164 \| s2cid = 18933772 ~~\| others = Sch. of Bus. IT, Kookmin Univ., Seoul~~ \| pages = 400–405 \| chapter = Distributed File System: Efficiency Experiments for Data Access and Communication Line 346 ⟶ 345: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Soares \| first1 = Tiago S. Line 358 ⟶ 356: \| year = 2013 \| doi = 10.1109/WETICE.2013.12 \| s2cid = 6155753 ~~\| others = nf. & Statistic Dept. (INE), Fed. Univ. of Santa Catarina (UFSC), Florianopolis, Brazil~~ \| pages = 158–163 \| chapter = A Data Management in a Private Cloud Storage Environment Utilizing High Performance Distributed File Systems Line 364 ⟶ 362: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Adamov \| first1 = Abzetdin Line 370 ⟶ 367: \| year = 2012 \| doi = 10.1109/ICAICT.2012.6398484 \| s2cid = 16674289 ~~\| others = Comput. Eng. Dept., Qafqaz Univ., Baku, Azerbaijan~~ \| pages = 1–3 \| chapter = Distributed file system as a basis of data-intensive computing Line 376 ⟶ 373: }} #* {{cite journal ~~\| ref=harv~~ \| author = Schwan Philip \| title = Lustre: Building a File System for 1,000-node Clusters Line 382 ⟶ 378: \| year = 2003 \| url = https://www.kernel.org/doc/ols/2003/ols2003-pages-380-386.pdf \| pages = 400–407 ~~\| others = Cluster File Systems, Inc.~~ ~~\| pages = 400–407~~ }} #* {{cite journal \|last1 = Jones ~~\| ref=harv~~ \|first1 ~~last1~~ = ~~Jones~~Terry \|last2 = Koniges ~~\|first1=Terry~~ \|first2 = Alice ~~\| last2= Koniges~~ \|last3 = Yates ~~\|first2=Alice~~ \|first3 = R. Kim ~~\|last3= Yates~~ \|title = Performance of the IBM General Parallel File System ~~\|first3=R. Kim~~ \|periodical = Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International ~~\| title = Performance of the IBM General Parallel File System~~ \|url = https://computing.llnl.gov/code/sio/GPFS_performance.pdf ~~\| periodical = Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International~~ \|year = 2000 ~~\| url = https://computing.llnl.gov/code/sio/GPFS_performance.pdf~~ \|access-date = 2014-01-24 ~~\|year =2000~~ \|archive-date = 2013-02-26 ~~\| others = Lawrence Livermore National Laboratory~~ \|archive-url = https://web.archive.org/web/20130226053255/https://computing.llnl.gov/code/sio/GPFS_performance.pdf \|url-status = dead }} #* {{cite ~~journal~~conference \|last1 = Weil ~~\| ref=harv~~ \|first1 ~~last1~~ = ~~Weil~~Sage A. \|last2 = Brandt ~~\| first1 = Sage A.~~ \|first2 = Scott A. ~~\| last2 =Brandt~~ \|last3 = Miller ~~\| first2=Scott A.~~ \|first3 = Ethan L. ~~\| last3 =Miller~~ \|last4 = Long ~~\| first3=Ethan L.~~ \|first4 = Darrell D. E. ~~\| last4 =Long~~ \|title = Ceph: A Scalable, High-Performance Distributed File System ~~\| first4= Darrell D. E.~~ \|year = 2006 ~~\| title =Ceph: A Scalable, High-Performance Distributed File System~~ \|url = http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf ~~\| year = 2006~~ \|conference = Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06) ~~\| url = http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf~~ \|access-date = 2014-01-24 ~~\| others = University of California, Santa Cruz~~ \|archive-date = 2012-03-09 \|archive-url = https://web.archive.org/web/20120309021423/http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf \|url-status = dead }} #* {{cite ~~journal~~report ~~\| ref=harv~~ \| last1 = Maltzahn \| first1 = Carlos Line 425 ⟶ 424: \| first4= Alex J. \| last5 = Brandt \| first5= Scott A. \| last6=Weil \| first6=Sage \| title =Ceph as a scalable alternative to the Hadoop Distributed FileSystem \| year = 2010 Line 433 ⟶ 432: }} #* {{cite book ~~\| ref=harv~~ \| last1 = S.A. \| first1 = Brandt Line 445 ⟶ 443: \| year = 2003 \| doi = 10.1109/MASS.2003.1194865 \| pages = 290–298 ~~\| others = Storage Syst. Res. Center, California Univ., Santa Cruz, CA, USA~~ ~~\| pages = 290–298~~ \| chapter = Efficient metadata management in large distributed storage systems \| isbn = 978-0-7695-1914-82 \| citeseerx = 10.1.1.13.2537 \| s2cid = 5548463 }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Garth A. \| first1 = Gibson Line 459 ⟶ 457: \| periodical = Communications of the ACM \| volume = 43 \| pages = 37–45 \| number = 11 \| ~~year~~date = November 2000 \| url =~~http~~https://www.cs.cmu.edu/~garth/CACM/CACM00-p37-gibson.pdf \| doi=10.1145/353360.353362 \| s2cid = 207644891 }} }} ~~#* {{cite arxiv~~ #* {{cite arXiv ~~\| ref=harv~~ \| last1 = Yee \| first1 = Tin Tin Line 473 ⟶ 472: \| year = 2011 \| eprint=1112.2025 \| class = cs.DC }} #* {{cite book ~~\| ref=harv~~ \| last1 = Cho Cho \| first1 = Khaing Line 481 ⟶ 480: \| last2 = Thinn Thu \| first2 = Naing \| s2cid = 224635 \| year = 2011 \| doi = 10.1109/CCIS.2011.6045066 Line 488: }} #* {{cite book ~~\| ref=harv~~ \| last1 = S.A. \| first1 = Brandt Line 500 ⟶ 499: \| year = 2011 \| doi = 10.1109/SWS.2011.6101263 \| s2cid = 14791637 ~~\| others = PCN&CAD Center, Beijing Univ. of Posts & Telecommun., Beijing, China~~ \| pages = 16–20 \| chapter = A carrier-grade service-oriented file storage architecture for cloud computing Line 506 ⟶ 505: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Ghemawat \| first1 =Sanjay Line 518 ⟶ 516: \| pages = 29–43 \| chapter = The Google file system \| isbn = 978-1-58113-757-56 \| s2cid =221261373 }} # Security #* {{cite book ~~\| ref=harv~~ \| last1 = Vecchiola \| first1 = C Line 530 ⟶ 528: \| last3 = Buyya \| first3 = R \| year = 2009 \| doi = 10.1109/I-SPAN.2009.150 \| pages = 4–16 ~~\| others = Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia~~ ~~\| pages = 4–16~~ \| chapter = High-Performance Cloud Computing: A View of Scientific Applications \| isbn = 978-1-4244-5403-7 \| arxiv = 0910.1979 \| s2cid = 1810240 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Miranda \| first1 = Mowbray Line 546 ⟶ 543: \| last2 = Siani \| first2 = Pearson \| s2cid = 10130310 \| year = 2009 \| doi = 10.1145/1621890.1621897 Line 552 ⟶ 550: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Naehrig \| first1 = Michael Line 563 ⟶ 560: \| chapter = Can homomorphic encryption be practical? \| isbn = 978-1-4503-1004-8 \| citeseerx = 10.1.1.225.8007 \| s2cid = 12274859 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Du \| first1 = Hongtao Line 574 ⟶ 572: \| year = 2012 \| doi = 10.1109/MIC.2012.6273264 \| s2cid = 40685246 ~~\| others = Comput. Coll., Northwestern Polytech. Univ., Xi'An, China~~ \| pages = 327–331 \| chapter = PsFS: A high-throughput parallel file system for secure Cloud Storage system Line 580 ⟶ 578: }} #* {{cite journal ~~\| ref=harv~~ \| last1 = A.Brandt \| first1 = Scott Line 589 ⟶ 586: \| last4 = Xue \| first4 = Lan \| title = Efficient Metadata Management in Large Distributed Storage Systems \| periodical = 11th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA \| year = 2003 \| url = http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf \| access-date = 2013-12-27 ~~\| others = Storage Systems Research Center University of California, Santa Cruz~~ \| archive-date = 2013-08-22 }} \| archive-url = https://web.archive.org/web/20130822213717/http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf \| url-status = dead }} #* {{cite journal ~~\| ref=harv~~ \| author = Lori M. Kaufman \| s2cid = 16233643 \| title =Data Security in the World of Cloud Computing \| ~~periodical~~journal = IEEE Security & Privacy~~, IEEE~~ \| year = 2009 \| doi = 10.1109/MSP.2009.87 Line 606: \| issue = 4 }} #* {{cite ~~journal~~book ~~\| ref=harv~~ \| last1 = Bowers \| first1 = Kevin Line 614 ⟶ 613: \| last3 = Oprea \| first3 =Alina \| title = Proceedings of the 16th ACM conference on Computer and communications security ~~\| title = HAIL: a high-availability and integrity layer for cloud storageComputing~~ \| chapter = HAIL: A high-availability and integrity layer for cloud storage ~~\| periodical = Proceedings of the 16th ACM conference on Computer and communications security~~ \| s2cid = 207176701 \| year = 2009 \| doi = 10.1145/1653662.1653686 Line 622: }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Juels \| first1 = Ari \| last2 = Oprea \| first2 =Alina \| s2cid = 17596621 \| title = New approaches to security and availability for cloud data \| doi = 10.1145/2408776.2408793 \| pages = 64–73 \| journal=~~Magazine~~ Communications of the ACM ~~CACM Homepage archive~~\| volume = 56 \|number= 2 \|date=February 2013 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Zhang \| first1 = Jing Line 645 ⟶ 644: \| year = 2012 \| doi = 10.1109/Grid.2012.17 \| s2cid = 10778240 ~~\| others = Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China~~ \| pages = 12–21 \| chapter = A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services Line 651 ⟶ 650: }} #* {{cite book ~~\| ref=harv~~ \| last1 = A. \| first1 = Pan Line 665 ⟶ 663: \| year = 2012 \| doi = 10.1109/SC.Companion.2012.103 \| s2cid = 5554936 ~~\| others = Dept. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA~~ \| pages = 753–759 \| chapter = Integrating High Performance File Systems in a Cloud Computing Environment Line 671 ⟶ 669: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Fan-Hsun \| first1 = Tseng Line 683 ⟶ 680: \| year = 2012 \| doi = 10.1109/ISPACS.2012.6473485 \| s2cid = 18260943 ~~\| others = Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Taoyuan, Taiwan~~ \| pages = 227–232 \| chapter = Implement a reliable and secure cloud distributed file system Line 689 ⟶ 686: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Di Sano \| first1 = M Line 701 ⟶ 697: \| year = 2012 \| doi = 10.1109/WETICE.2012.104 \| s2cid = 19798809 ~~\| others = Dept. of Electr., Electron. & Comput. Eng., Univ. of Catania, Catania, Italy~~ \| pages = 173–178 \| chapter = File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds Line 707 ⟶ 703: }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Zhifeng \| first1 = Xiao \| last2 = Yang \| first2 = Xiao \| s2cid = 206583820 \| title = Security and Privacy in Cloud Computing \| periodical = IEEE Communications Surveys &and Tutorials~~, IEEE~~ \| year = 2013 \| doi = 10.1109/SURV.2012.060912.00182 Line 719 ⟶ 715: \| volume=15 \| issue = 2 \| citeseerx = 10.1.1.707.3980 }} #* {{Cite web ~~\| ref=harv~~ \| last1 = John B \| first1 = Horrigan Line 727 ⟶ 723: \| year = 2008 \| url = http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf \| access-date = 2013-12-27 }} \| archive-date = 2013-07-12 \| archive-url = https://web.archive.org/web/20130712182757/http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf \| url-status = dead }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Yau \| first1 = Stephen Line 741 ⟶ 740: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Carnegie \| first1 = Bin Fan Line 750 ⟶ 748: \| last4 = Gibson \| first4 = Garth \| title = Proceedings of the 4th Annual Workshop on Petascale Data Storage ~~\| title = DiskReduce: RAID for data-intensive scalable computing~~ \| chapter = DiskReduce: RAID for data-intensive scalable computing \| s2cid = 15194567 \| year = 2009 \| doi = 10.1145/1713072.1713075 \| pages = 6–10 \| isbn = 978-1-60558-883-4 ~~\| chapter = Disk ''Reduce''~~ ~~\| isbn = 978-1-60558-883-4~~ }} #* {{cite book ~~\| ref=harv~~ \| last1 = Wang \| first1 = Jianzong Line 769 ⟶ 766: \| last4 = Xie \| first4 = Changsheng \| s2cid = 16827141 \| year = 2012 \| doi = 10.1109/Grid.2012.29 Line 775 ⟶ 773: \| isbn = 978-1-4673-2901-9 }} #* {{cite ~~journal~~book ~~\| ref=harv~~ \| last1 = Abu-Libdeh \| first1 = Hussam Line 783 ⟶ 780: \| last3 = Weatherspoon \| first3 = Hakim \| title = ~~RACS:~~Proceedings aof ~~case~~the ~~for~~1st ~~cloud~~ACM ~~storage~~symposium ~~diversity~~on Cloud computing \| chapter = RACS: A case for cloud storage diversity ~~\| periodical = SoCC '10 Proceedings of the 1st ACM symposium on Cloud computing~~ \| s2cid = 1283873 \| year = 2010 \| doi = 10.1145/1807128.1807165 Line 791 ⟶ 789: }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Vogels \| first1 = Werner Line 798 ⟶ 795: \| doi = 10.1145/1435417.1435432 \| pages = 40–44 \| journal=Communications of the ACM ~~– Rural engineering development CACM~~\| volume = 52 \|number= 1 \| doi-access = free }} }} #* {{cite book ~~\| ref=harv~~ \| last1 = Cuong \| first1 = Pham Line 811 ⟶ 808: \| last4 = Iyer \| first4 =R.K \| s2cid = 9920903 \| year = 2012 \| doi = 10.1109/DSNW.2012.6264687 Line 818 ⟶ 816: }} #* {{cite book ~~\| ref=harv~~ \| last1 = A. \| first1 = Undheim Line 826 ⟶ 823: \| last3 = P. \| first3 = Heegaard \| s2cid = 15047580 \| year = 2011 \| doi = 10.1109/Grid.2011.25 Line 832 ⟶ 830: \| isbn = 978-1-4577-1904-2 }} #* {{cite ~~journal~~book ~~\| ref=harv~~ \| last1 = Qian \| first1 = Haiyang Line 840 ⟶ 837: \| last3 = T. \| first3 = Trivedi \| title = 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops ~~\| title = A hierarchical model to evaluate quality of experience of online services hosted by cloud computing~~ \| chapter = A hierarchical model to evaluate quality of experience of online services hosted by cloud computing \| year = 2011 \| doi = 10.1109/INM.2011.5990680 \| pages = 105–112 ~~\| journal=Communications of the ACM – Rural engineering development CACM~~\| volume = 52 \|number= 1 \| isbn = 978-1-4244-9219-0 }} \| citeseerx = 10.1.1.190.5148 \| s2cid = 15912111 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Ateniese \| first1 = Giuseppe Line 863: \| last7 = Song \| first7 = Dawn \| s2cid = 8010083 \| year = 2007 \| doi = 10.1145/1315245.1315318 Line 868 ⟶ 869: \| chapter = Provable data possession at untrusted stores \| isbn = 978-1-59593-703-2 \| url = https://figshare.com/articles/journal_contribution/6469184 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Ateniese \| first1 = Giuseppe Line 885 ⟶ 886: \| chapter = Scalable and efficient provable data possession \| isbn = 978-1-60558-241-2 \| citeseerx = 10.1.1.208.8270 \| s2cid = 207170639 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Erway \| first1 = Chris Line 897 ⟶ 899: \| last4 = Papamanthou \| first4 = Charalampos \| s2cid = 52856440 \| year = 2009 \| doi = 10.1145/1653662.1653688 Line 903 ⟶ 906: \| isbn = 978-1-60558-894-0 }} #* {{cite ~~journal~~book ~~\| ref=harv~~ \| last1 = Juels \| first1 = Ari \| last2 = S. Kaliski \| first2 = Burton \| title = Proceedings of the 14th ACM conference on Computer and communications security ~~\| title = Pors: proofs of retrievability for large files~~ \| chapter = Pors: Proofs of retrievability for large files ~~\| periodical = Proceedings of the 14th ACM conference on Computer and communications~~ \| s2cid = 6032317 \| year = 2007 \| doi = 10.1145/1315245.1315317 Line 917 ⟶ 920: }} #* {{cite book ~~\| ref=harv~~ \| last1 = Bonvin \| first1 =Nicolas Line 925 ⟶ 927: \| last3 = Aberer \| first3 = Karl \| s2cid = 3261817 \| year = 2009 \| doi = 10.1145/1807128.1807162 Line 930 ⟶ 933: \| chapter = A self-organized, fault-tolerant and scalable replication scheme for cloud storage \| isbn = 978-1-4503-0036-0 \| url =http://infoscience.epfl.ch/record/146774 }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Tim \| first1 = Kraska Line 943 ⟶ 946: \| title = Consistency rationing in the cloud: pay only when it matters \| year = 2009 ~~\| url = http://dl.acm.org/citation.cfm?doid=1687627.1687657~~ \| pages = 253–264 \| journal=Proceedings of the VLDB Endowment ~~VLDB Endowment Homepage archive~~\| volume = 2 \|issue= 1\|doi=10.14778/1687627.1687657 }} #* {{cite ~~journal~~report ~~\| ref=harv~~ \| last1 = Daniel \| first1 = J. Abadi \| title = Data Management in the Cloud: Limitations and Opportunities \| citeseerx=10.1.1.178.200 ~~\| periodical = IEEE~~ ~~\| layurl = http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=867310AB38EE46A5E505E698E2F8C82F?doi=10.1.1.178.200&rep=rep1&type=pdf~~ ~~\| url = ftp://131.107.65.22/pub/debull/A09mar/abadi.pdf~~ \| year = 2009 }} #* {{cite journal ~~\| ref=harv~~ \| last1 = Ari \| first1 = Juels Line 965 ⟶ 963: \| last3 = Jr \| first3 = Kaliski \| s2cid = 6032317 \| title = Pors: proofs of retrievability for large files \| year = 2007 \| doi = 10.1145/1315245.1315317 \| pages = 584–597 \| journal=Communications of the ACM ~~CACM~~\| volume = 56\|number= 2 }} #* {{cite book ~~\| ref=harv~~ \| last1 = Ari \| first1 = Ateniese Line 989 ⟶ 987: \| last8 = Dawn \| first8 = Song \| s2cid = 8010083 \| title = CCS '07 Proceedings of the 14th ACM conference on Computer and communications security \| year = 2007 Line 995 ⟶ 994: \| chapter = Provable data possession at untrusted stores \| isbn = 978-1-59593-703-2 \| url = https://figshare.com/articles/journal_contribution/6469184 }} # Synchronization #* {{cite book ~~\| ref=harv~~ \| last1 = Uppoor \| first1 = S Line 1,009 ⟶ 1,008: \| doi = 10.1109/CLUSTERWKSP.2010.5613087 \| pages = 1–4 \| s2cid = 14577793 ~~\| others =Inst. of Comput. Sci. (ICS), Found. for Res. & Technol. - Hellas (FORTH), Heraklion, Greece~~ \| chapter = Cloud-based synchronization of distributed file system hierarchies \| isbn = 978-1-4244-8395-2 }} Line 1,017 ⟶ 1,016: \| last1 = Lori M. \| first1 = Kaufman \| s2cid = 16233643 \| title = Data Security in the World of Cloud Computing \| ~~periodical~~journal = IEEE Security & Privacy~~, IEEE~~ \| year = 2009 \| doi = 10.1109/MSP.2009.87 Line 1,026: }} #* {{cite conference ~~\| ref=harv~~ \| last1 = Marston \| first1 = Sean Line 1,038 ⟶ 1,037: \| first5 = Anand \| title = Cloud computing — The business perspective \| conference = Decision Support Systems Volume 51, Issue 1, \| year = 2011 \| doi = 10.1016/j.dss.2010.12.006 Line 1,055 ⟶ 1,054: \| year = 2011 \| doi = 10.1109/3PGCIC.2011.37 \| s2cid = 13393620 ~~\|others= Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran~~ \| pages =193–199 ~~\|ref=harv~~ \| chapter = Suitability of Cloud Computing for Scientific Data Analyzing Applications; an Empirical Study \| isbn = 978-1-4577-1448-1 Line 1,065 ⟶ 1,063: [[Category:Cloud storage]] [[Category:Cloud computing]]