Distributed file system for cloud: Difference between revisions

Content deleted Content added
Bot: Removing Commons:File:UploadDownload.PNG (en). It was deleted on Commons by Jcb (Copyright violation; see Commons:Licensing (F1) -).
KolbertBot (talk | contribs)
m Task #2 : Remove link referral data
Line 20:
== Architectures ==
Most distributed file systems are built on the client-server architecture, but other, decentralized, solutions exist as well.
 
=== Client-server architecture ===
[[Network File System]] (NFS) uses a [[client-server architecture]], which allows sharing files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual ___location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently.<ref>{{harvnb|Di Sano| Di Stefano|Morana|Zito|2012|p=2}}</ref> The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model:
Line 102 ⟶ 103:
Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.<ref>{{harvnb|Soares| Dantas†|de Macedo|Bauer|2013|p=158}}</ref> Some examples include: [[MapR FS|MapR File System]] (MapR-FS), [[Ceph (storage)|Ceph-FS]], [[BeeGFS|Fraunhofer File System (BeeGFS)]], [[Lustre (file system)|Lustre File System]], [[IBM General Parallel File System]] (GPFS), and [[Parallel Virtual File System]].
 
MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.<ref name="mapr-productivity">{{cite web|last1=Perez|first1=Nicolas|title=How MapR improves our productivity and simplifies our design|url=https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.mvr6mmydr|website=Medium|publisher=Medium|accessdate=June 21, 2016}}</ref><ref>{{cite web|last1=Woodie|first1=Alex|title=From Hadoop to Zeta: Inside MapR’s Convergence Conversion|url=http://www.datanami.com/2016/03/08/from-hadoop-to-zeta-inside-maprs-convergence-conversion/|website=Datanami|publisher=Tabor Communications Inc.|accessdate=June 21, 2016}}</ref><ref>{{cite web|last1=Brennan|first1=Bob|title=Flash Memory Summit|url=https://www.youtube.com/watch?v=fOT63zR7PvU&feature=youtu.be&t=1682|website=youtube|publisher=Samsung|accessdate=June 21, 2016}}</ref><ref name="maprfs-video">{{cite web|last1=Srivas|first1=MC|title=MapR File System|url=https://www.youtube.com/watch?v=fP4HnvZmpZI|website=Hadoop Summit 2011|publisher=Hortonworks|accessdate=June 21, 2016}}</ref><ref name="real-world-hadoop">{{cite book|last1=Dunning|first1=Ted|last2=Friedman|first2=Ellen|title=Real World Hadoop|date=January 2015|publisher=O'Reilly Media, Inc|___location=Sebastopol, CA|isbn=978-1-4919-2395-5|pages=23–28|edition=First|url=http://shop.oreilly.com/product/0636920038450.do|accessdate=June 21, 2016|language=English|chapter=Chapter 3: Understanding the MapR Distribution for Apache Hadoop}}</ref>
 
Ceph-FS is a distributed file system that provides excellent performance and reliability.<ref>{{harvnb|Weil|Brandt|Miller|Long|2006|p=307}}</ref> It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.<ref>{{harvnb|Maltzahn|Molina-Estolano|Khurana|Nelson|2010|p=39}}</ref>