Revision as of 03:41, 9 January 2019 edit Girth Summit (talk \| contribs) Autopatrolled, Checkusers, Administrators 102,530 edits Reverted good faith edits by JMCF EL (talk): Revert addition of external link. (TW) Tag: Undo ← Previous edit		Revision as of 18:38, 20 February 2019 edit undo Citation bot (talk \| contribs) Bots 5,863,260 edits m Alter: title, template type, isbn, periodical, journal. Add: arxiv, class, pages, year, citeseerx, chapter-url, date, title. Converted bare reference to cite template. Removed parameters. Formatted dashes. \| You can use this bot yourself. Report bugs here. \| User-activated. Next edit →
Line 82: Note that when the master assigns the write operation to a replica, it increments the chunk version number and informs all of the replicas containing that chunk of the new version number. Chunk version numbers allow for update error-detection, if a replica wasn't updated because its chunk server was down.<ref>{{harvnb\|Krzyzanowski\|2012\|p=5}}</ref> Some new Google applications did not work well with the 64-megabyte chunk size. To solve that problem, GFS started, in 2004, to implement the [[Bigtable]] approach.<ref>[{{Cite web \| url=https://arstechnica.com/business/2012/01/the-big-disk-drive-in-the-sky-how-the-giants-of-the-web-store-big-data/] \| title=The Great Disk Drive in the Sky: How Web giants store big—and we mean big—data\| date=2012-01-27}}</ref> ==== Hadoop distributed file system ==== Line 88: {{abbr\|HDFS \|Hadoop Distributed File System}}, developed by the [[Apache Software Foundation]], is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes). Its architecture is similar to GFS, i.e. a master/slave architecture. The HDFS is normally installed on a cluster of computers. The design concept of Hadoop is informed by Google's, with Google File System, Google MapReduce and [[Bigtable]], being implemented by Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Hadoop Base (HBase) respectively.<ref>{{harvnb\|Fan-Hsun\|Chi-Yuan\| Li-Der\| Han-Chieh\|2012\|p=2}}</ref> Like GFS, HDFS is suited for scenarios with write-once-read-many file access, and supports file appends and truncates in lieu of random reads and writes to simplify data coherency issues.<ref>{{Cite web \| url=http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Assumptions_and_Goals \| title=Apache Hadoop 2.9.2 – HDFS Architecture}}</ref> An HDFS cluster consists of a single NameNode and several DataNode machines. The NameNode, a master server, manages and maintains the metadata of storage DataNodes in its RAM. DataNodes manage storage attached to the nodes that they run on. NameNode and DataNode are software designed to run on everyday-use machines, which typically run under a GNU/Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software.<ref>{{harvnb\|Azzedin\|2013\|p=2}}</ref> Line 103: Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.<ref>{{harvnb\|Soares\| Dantas†\|de Macedo\|Bauer\|2013\|p=158}}</ref> Some examples include: [[MapR FS\|MapR File System]] (MapR-FS), [[Ceph (storage)\|Ceph-FS]], [[BeeGFS\|Fraunhofer File System (BeeGFS)]], [[Lustre (file system)\|Lustre File System]], [[IBM General Parallel File System]] (GPFS), and [[Parallel Virtual File System]]. MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.<ref name="mapr-productivity">{{cite web\|last1=Perez\|first1=Nicolas\|title=How MapR improves our productivity and simplifies our design\|url=https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.mvr6mmydr\|website=Medium\|publisher=Medium\|accessdate=June 21, 2016\|date=2016-01-02}}</ref><ref>{{cite web\|last1=Woodie\|first1=Alex\|title=From Hadoop to Zeta: Inside ~~MapR’s~~MapR's Convergence Conversion\|url=http://www.datanami.com/2016/03/08/from-hadoop-to-zeta-inside-maprs-convergence-conversion/\|website=Datanami\|publisher=Tabor Communications Inc.\|accessdate=June 21, 2016\|date=2016-03-08}}</ref><ref>{{cite web\|last1=Brennan\|first1=Bob\|title=Flash Memory Summit\|url=https://www.youtube.com/watch?v=fOT63zR7PvU&t=1682\|website=youtube\|publisher=Samsung\|accessdate=June 21, 2016}}</ref><ref name="maprfs-video">{{cite web\|last1=Srivas\|first1=MC\|title=MapR File System\|url=https://www.youtube.com/watch?v=fP4HnvZmpZI\|website=Hadoop Summit 2011\|publisher=Hortonworks\|accessdate=June 21, 2016}}</ref><ref name="real-world-hadoop">{{cite book\|last1=Dunning\|first1=Ted\|last2=Friedman\|first2=Ellen\|title=Real World Hadoop\|date=January 2015\|publisher=O'Reilly Media, Inc\|___location=Sebastopol, CA\|isbn=978-1-4919-2395-5\|pages=23–28\|edition=First\|chapter-url=http://shop.oreilly.com/product/0636920038450.do\|accessdate=June 21, 2016\|language=English\|chapter=Chapter 3: Understanding the MapR Distribution for Apache Hadoop}}</ref> Ceph-FS is a distributed file system that provides excellent performance and reliability.<ref>{{harvnb\|Weil\|Brandt\|Miller\|Long\|2006\|p=307}}</ref> It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.<ref>{{harvnb\|Maltzahn\|Molina-Estolano\|Khurana\|Nelson\|2010\|p=39}}</ref> Line 195: \| url=http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf }} * {{cite ~~web~~journal \| ref=harv \| author = Fabio Kon ~~\| url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.4609~~ \| title =Distributed File Systems, The State of the Art and concept of Ph.D. Thesis \| citeseerx = 10.1.1.42.4609 }}▼ \| year = 1996 ▲ }} * {{cite web \| ref=harv Line 448 ⟶ 449: \| pages = 290–298 \| chapter = Efficient metadata management in large distributed storage systems \| isbn = 978-0-7695-1914-82 \| citeseerx = 10.1.1.13.2537 }} #* {{cite journal Line 459 ⟶ 461: \| periodical = Communications of the ACM \| volume = 43 \| pages = 37–45 \| number = 11 \| year = November 2000 Line 473 ⟶ 476: \| year = 2011 \| eprint=1112.2025 \| class = cs.DC }} #* {{cite book Line 518 ⟶ 522: \| pages = 29–43 \| chapter = The Google file system \| isbn = 978-1-58113-757-56 }} # Security Line 537 ⟶ 541: \| chapter = High-Performance Cloud Computing: A View of Scientific Applications \| isbn = 978-1-4244-5403-7 \| arxiv = 0910.1979 }} #* {{cite book Line 563 ⟶ 568: \| chapter = Can homomorphic encryption be practical? \| isbn = 978-1-4503-1004-8 \| citeseerx = 10.1.1.225.8007 }} #* {{cite book Line 606 ⟶ 612: \| issue = 4 }} #* {{cite ~~journal~~book \| ref=harv \| last1 = Bowers Line 615 ⟶ 621: \| first3 =Alina \| title = HAIL: a high-availability and integrity layer for cloud storageComputing \| periodical = Proceedings of the 16th ACM ~~conference~~Conference on Computer and ~~communications~~Communications ~~security~~Security \| year = 2009 \| doi = 10.1145/1653662.1653686 Line 630 ⟶ 636: \| doi = 10.1145/2408776.2408793 \| pages = 64–73 \| journal=Magazine Communications of the ACM CACM Homepage ~~archive~~Archive\| volume = 56 \|number= 2 \|date=February 2013 }} #* {{cite book Line 775 ⟶ 781: \| isbn = 978-1-4673-2901-9 }} #* {{cite ~~journal~~book \| ref=harv \| last1 = Abu-Libdeh Line 784 ⟶ 790: \| first3 = Hakim \| title = RACS: a case for cloud storage diversity \| periodical = SoCC '10 Proceedings of the 1st ACM ~~symposium~~Symposium on Cloud ~~computing~~Computing \| year = 2010 \| doi = 10.1145/1807128.1807165 Line 798 ⟶ 804: \| doi = 10.1145/1435417.1435432 \| pages = 40–44 \| journal=Communications of the ACM – Rural ~~engineering~~Engineering ~~development~~Development CACM\| volume = 52 \|number= 1 }} #* {{cite book Line 844 ⟶ 850: \| doi = 10.1109/INM.2011.5990680 \| pages = 105–112 \| journal=Communications of the ACM – Rural ~~engineering~~Engineering ~~development~~Development CACM\| volume = 52 \|number= 1 \| citeseerx = 10.1.1.190.5148 }} #* {{cite book \| ref=harv Line 885 ⟶ 892: \| chapter = Scalable and efficient provable data possession \| isbn = 978-1-60558-241-2 \| citeseerx = 10.1.1.208.8270 }} #* {{cite book Line 903 ⟶ 911: \| isbn = 978-1-60558-894-0 }} #* {{cite ~~journal~~book \| ref=harv \| last1 = Juels Line 910 ⟶ 918: \| first2 = Burton \| title = Pors: proofs of retrievability for large files \| periodical = Proceedings of the 14th ACM ~~conference~~Conference on Computer and ~~communications~~Communications \| year = 2007 \| doi = 10.1145/1315245.1315317 Line 943 ⟶ 951: \| title = Consistency rationing in the cloud: pay only when it matters \| year = 2009 ~~\| url = http://dl.acm.org/citation.cfm?doid=1687627.1687657~~ \| pages = 253–264 \| journal=Proceedings of the VLDB Endowment VLDB Endowment Homepage ~~archive~~Archive\| volume = 2 \|issue= 1\|doi=10.14778/1687627.1687657 }} #* {{cite journal

Distributed file system for cloud: Difference between revisions