Distributed file system for cloud: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:50, 7 September 2023 edit Stephenamills (talk \| contribs) Extended confirmed users 1,403 edits m Added category Tag: Visual edit ← Previous edit		Latest revision as of 13:38, 29 July 2025 edit undo Lynch44 (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers, Temporary account IP viewers 15,728 edits m Reverted edit by 154.133.103.135 (talk) to last version by JCW-CleanerBot Tags: Rollback Mobile edit Mobile web edit
(11 intermediate revisions by 7 users not shown)
Line 42: ===== Load balancing ===== [[Load balancing (computing)\|Load balancing]] is essential for efficient operation in distributed environments. It means distributing work among different servers,<ref>{{harvnb\|Kai\|Dayang\|Hui\|Yintang\|2013\|p=23}}</ref> fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.<ref name="ReferenceA">{{harvnb\|Hsiao\|Chung\|Shen\|Chao\|2013\|p=2}}</ref> In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications. ===== Load rebalancing ===== Line 100: Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.<ref>{{harvnb\|Soares\| Dantas†\|de Macedo\|Bauer\|2013\|p=158}}</ref> Some examples include: [[MapR FS\|MapR File System]] (MapR-FS), [[Ceph (storage)\|Ceph-FS]], [[BeeGFS\|Fraunhofer File System (BeeGFS)]], [[Lustre (file system)\|Lustre File System]], [[IBM General Parallel File System]] (GPFS), and [[Parallel Virtual File System]]. MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.<ref name="mapr-productivity">{{cite web\|last1=Perez\|first1=Nicolas\|title=How MapR improves our productivity and simplifies our design\|url=https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.mvr6mmydr\|website=Medium\|access-date=June 21, 2016\|date=2016-01-02}}</ref><ref>{{cite web\|last1=Woodie\|first1=Alex\|title=From Hadoop to Zeta: Inside MapR's Convergence Conversion\|url=http://www.datanami.com/2016/03/08/from-hadoop-to-zeta-inside-maprs-convergence-conversion/\|website=Datanami\|publisher=Tabor Communications Inc.\|access-date=June 21, 2016\|date=2016-03-08}}</ref><ref>{{cite web\|last1=Brennan\|first1=Bob\|title=Flash Memory Summit\|url=https://www.youtube.com/watch?v=fOT63zR7PvU&t=1682\|website=youtube\|publisher=Samsung\|access-date=June 21, 2016}}</ref><ref name="maprfs-video">{{cite web\|last1=Srivas\|first1=MC\|title=MapR File System\|url=https://www.youtube.com/watch?v=fP4HnvZmpZI\|website=Hadoop Summit 2011\|date=23 July 2011 \|publisher=Hortonworks\|access-date=June 21, 2016}}</ref><ref name="real-world-hadoop">{{cite book\|last1=Dunning\|first1=Ted\|last2=Friedman\|first2=Ellen\|title=Real World Hadoop\|date=January 2015\|publisher=O'Reilly Media, Inc\|___location=Sebastopol, CA\|isbn=978-1-4919-2395-5\|pages=23–28\|edition=First\|chapter-url=http://shop.oreilly.com/product/0636920038450.do\|access-date=June 21, 2016\|language=en\|chapter=Chapter 3: Understanding the MapR Distribution for Apache Hadoop}}</ref> Ceph-FS is a distributed file system that provides excellent performance and reliability.<ref>{{harvnb\|Weil\|Brandt\|Miller\|Long\|2006\|p=307}}</ref> It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.<ref>{{harvnb\|Maltzahn\|Molina-Estolano\|Khurana\|Nelson\|2010\|p=39}}</ref> Line 185: == Bibliography == * {{cite book \| last1 = Andrew \| first1 = S.Tanenbaum \| last2 = Maarten \| first2 = Van Steen \| year = 2006 \| title = Distributed systems principles and paradigms \| url = http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf \| access-date = 2014-01-10 }}▼ \| archive-date = 2013-08-20 \| archive-url = https://web.archive.org/web/20130820190519/http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf \| url-status = dead ▲ }} * {{cite web \| first = Fabio \|last = Kon Line 210 ⟶ 216: * {{cite web \| last1 = Jacobi \| first1 = Tim-Daniel \| last2 = Lingemann \| first2 = Jan \| url = http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf \| title = Evaluation of Distributed File Systems \| access-date = 2014-01-24 }}▼ \| archive-date = 2014-02-03 \| archive-url = https://web.archive.org/web/20140203140412/http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf \| url-status = dead ▲ }} # Architecture, structure, and design: #* {{cite book Line 230 ⟶ 240: \| year = 2012 \| doi = 10.1109/ClusterW.2012.27 ~~\| others = Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China~~ \| s2cid = 12430485 \| chapter = A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P Line 241 ⟶ 250: \| year = 2013 \| doi = 10.1109/CTS.2013.6567222 ~~\| others = Information and Computer Science Department King Fahd University of Petroleum and Minerals~~ \| s2cid = 45293053 \| pages = 155–161 Line 253 ⟶ 261: \| year = 2012 \| url = http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf \| access-date = 2013-12-27 }}▼ \| archive-date = 2013-12-27 \| archive-url = https://web.archive.org/web/20131227152320/http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf \| url-status = dead ▲ }} #* {{cite conference \| last1 = Kobayashi \| first1 = K Line 262 ⟶ 274: \| title = The Gfarm File System on Compute Clouds \| conference = Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on \| conference-url = ~~http~~https://ieeexplore.ieee.org/xpl/~~mostRecentIssue.jsp?punumber=~~conhome/6008655/proceeding \| doi = 10.1109/IPDPS.2011.255 ~~\| others = Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan~~ }} #* {{cite book Line 272 ⟶ 283: \| year = 2012 \| doi = 10.1109/ICAICT.2012.6398489 ~~\| others = Department of Computer Engineering Qafqaz University Baku, Azerbaijan~~ \| s2cid = 6113112 \| pages = 1–5 Line 288 ⟶ 298: \| first4 =Yu-Chang \| title = Load Rebalancing for Distributed File Systems in Clouds \| ~~periodical~~journal = IEEE Transactions on Parallel and Distributed Systems~~, IEEE Transactions on~~ \| year = 2013 \| doi = 10.1109/TPDS.2012.196 ~~\| others = National Cheng Kung University, Tainan~~ \| s2cid = 11271386 \| pages = 951–962 Line 309 ⟶ 318: \| year = 2013 \| doi = 10.1109/INCoS.2013.14 ~~\| others = State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an, China~~ \| s2cid = 14821266 \| pages = 23–29 Line 331 ⟶ 339: \| year = 2008 \| doi = 10.1109/NCM.2008.164 ~~\| others = Sch. of Bus. IT, Kookmin Univ., Seoul~~ \| s2cid = 18933772 \| pages = 400–405 Line 349 ⟶ 356: \| year = 2013 \| doi = 10.1109/WETICE.2013.12 ~~\| others = nf. & Statistic Dept. (INE), Fed. Univ. of Santa Catarina (UFSC), Florianopolis, Brazil~~ \| s2cid = 6155753 \| pages = 158–163 Line 361 ⟶ 367: \| year = 2012 \| doi = 10.1109/ICAICT.2012.6398484 \| s2cid = 16674289▼ ~~\| others = Comput. Eng. Dept., Qafqaz Univ., Baku, Azerbaijan~~ ▲ \| s2cid = 16674289 \| pages = 1–3 \| chapter = Distributed file system as a basis of data-intensive computing Line 373 ⟶ 378: \| year = 2003 \| url = https://www.kernel.org/doc/ols/2003/ols2003-pages-380-386.pdf \| pages = 400–407▼ ~~\| others = Cluster File Systems, Inc.~~ ▲ \| pages = 400–407 }} #* {{cite journal \| last1 = Jones \|first1 = Terry \| last2 = Koniges \|first2 = Alice \|last3 = Yates \|first3 = R. Kim \| title = Performance of the IBM General Parallel File System \| periodical = Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International \| url = https://computing.llnl.gov/code/sio/GPFS_performance.pdf \|year = 2000 \|access-date = 2014-01-24 ~~\| others = Lawrence Livermore National Laboratory~~ \|archive-date = 2013-02-26 \|archive-url = https://web.archive.org/web/20130226053255/https://computing.llnl.gov/code/sio/GPFS_performance.pdf \|url-status = dead }} #* {{cite conference \| last1 = Weil \| first1 = Sage A. \| last2 = Brandt \| first2 = Scott A. \| last3 = Miller \| first3 = Ethan L. \| last4 = Long \| first4 = Darrell D. E. \| title = Ceph: A Scalable, High-Performance Distributed File System \| year = 2006 \| url = http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf \| conference = Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06) \|access-date = 2014-01-24 \|archive-date = 2012-03-09 \|archive-url = https://web.archive.org/web/20120309021423/http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf \|url-status = dead }} #* {{cite report Line 432 ⟶ 443: \| year = 2003 \| doi = 10.1109/MASS.2003.1194865 \| pages = 290–298▼ ~~\| others = Storage Syst. Res. Center, California Univ., Santa Cruz, CA, USA~~ ▲ \| pages = 290–298 \| chapter = Efficient metadata management in large distributed storage systems \| isbn = 978-0-7695-1914-2 Line 489 ⟶ 499: \| year = 2011 \| doi = 10.1109/SWS.2011.6101263 \| s2cid = 14791637▼ ~~\| others = PCN&CAD Center, Beijing Univ. of Posts & Telecommun., Beijing, China~~ ▲ \| s2cid = 14791637 \| pages = 16–20 \| chapter = A carrier-grade service-oriented file storage architecture for cloud computing Line 509 ⟶ 518: \| isbn = 978-1-58113-757-6 \| s2cid =221261373 ~~\| chapter-url = https://www.semanticscholar.org/paper/7b56847e641168aed58f3603bc00af84d414c9aa~~ }} # Security Line 522 ⟶ 530: \| year = 2009 \| doi = 10.1109/I-SPAN.2009.150 \| pages = 4–16▼ ~~\| others = Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia~~ ▲ \| pages = 4–16 \| chapter = High-Performance Cloud Computing: A View of Scientific Applications \| isbn = 978-1-4244-5403-7 Line 565 ⟶ 572: \| year = 2012 \| doi = 10.1109/MIC.2012.6273264 \| s2cid = 40685246▼ ~~\| others = Comput. Coll., Northwestern Polytech. Univ., Xi'An, China~~ ▲ \| s2cid = 40685246 \| pages = 327–331 \| chapter = PsFS: A high-throughput parallel file system for secure Cloud Storage system Line 580 ⟶ 586: \| last4 = Xue \| first4 = Lan \| title = Efficient Metadata Management in Large Distributed Storage Systems \| periodical = 11th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA \| year = 2003 \| url = http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf \| access-date = 2013-12-27 ~~\| others = Storage Systems Research Center University of California, Santa Cruz~~ \| archive-date = 2013-08-22 }}▼ \| archive-url = https://web.archive.org/web/20130822213717/http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf \| url-status = dead ▲ }} #* {{cite journal \| author = Lori M. Kaufman \| s2cid = 16233643 \| title =Data Security in the World of Cloud Computing \| ~~periodical~~journal = IEEE Security & Privacy~~, IEEE~~ \| year = 2009 \| doi = 10.1109/MSP.2009.87 Line 635 ⟶ 644: \| year = 2012 \| doi = 10.1109/Grid.2012.17 \| s2cid = 10778240▼ ~~\| others = Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China~~ ▲ \| s2cid = 10778240 \| pages = 12–21 \| chapter = A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services Line 655 ⟶ 663: \| year = 2012 \| doi = 10.1109/SC.Companion.2012.103 \| s2cid = 5554936▼ ~~\| others = Dept. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA~~ ▲ \| s2cid = 5554936 \| pages = 753–759 \| chapter = Integrating High Performance File Systems in a Cloud Computing Environment Line 673 ⟶ 680: \| year = 2012 \| doi = 10.1109/ISPACS.2012.6473485 \| s2cid = 18260943▼ ~~\| others = Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Taoyuan, Taiwan~~ ▲ \| s2cid = 18260943 \| pages = 227–232 \| chapter = Implement a reliable and secure cloud distributed file system Line 691 ⟶ 697: \| year = 2012 \| doi = 10.1109/WETICE.2012.104 \| s2cid = 19798809▼ ~~\| others = Dept. of Electr., Electron. & Comput. Eng., Univ. of Catania, Catania, Italy~~ ▲ \| s2cid = 19798809 \| pages = 173–178 \| chapter = File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds Line 718 ⟶ 723: \| year = 2008 \| url = http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf \| access-date = 2013-12-27 }}▼ \| archive-date = 2013-07-12 \| archive-url = https://web.archive.org/web/20130712182757/http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf \| url-status = dead ▲ }} #* {{cite journal \| last1 = Yau Line 821 ⟶ 830: \| isbn = 978-1-4577-1904-2 }} #* {{cite ~~journal~~book \| last1 = Qian \| first1 = Haiyang Line 828 ⟶ 837: \| last3 = T. \| first3 = Trivedi \| title = 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops \| ~~title~~chapter = A hierarchical model to evaluate quality of experience of online services hosted by cloud computing \| year = 2011 \| doi = 10.1109/INM.2011.5990680 \| pages = 105–112 ~~\| journal=Communications of the ACM~~\| volume = 52 \|number= 1 \| isbn = 978-1-4244-9219-0 \| citeseerx = 10.1.1.190.5148 \| s2cid = 15912111 }} Line 858 ⟶ 869: \| chapter = Provable data possession at untrusted stores \| isbn = 978-1-59593-703-2 \| url = https://figshare.com/articles/journal_contribution/6469184 }} #* {{cite book Line 982 ⟶ 994: \| chapter = Provable data possession at untrusted stores \| isbn = 978-1-59593-703-2 \| url = https://figshare.com/articles/journal_contribution/6469184 }} # Synchronization Line 995 ⟶ 1,008: \| doi = 10.1109/CLUSTERWKSP.2010.5613087 \| pages = 1–4 ~~\| others =Inst. of Comput. Sci. (ICS), Found. for Res. & Technol. - Hellas (FORTH), Heraklion, Greece~~ \| s2cid = 14577793 \| chapter = Cloud-based synchronization of distributed file system hierarchies Line 1,006 ⟶ 1,018: \| s2cid = 16233643 \| title = Data Security in the World of Cloud Computing \| ~~periodical~~journal = IEEE Security & Privacy~~, IEEE~~ \| year = 2009 \| doi = 10.1109/MSP.2009.87 Line 1,042 ⟶ 1,054: \| year = 2011 \| doi = 10.1109/3PGCIC.2011.37 \| s2cid = 13393620▼ ~~\|others= Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran~~ ▲ \| s2cid = 13393620 \| pages =193–199 \| chapter = Suitability of Cloud Computing for Scientific Data Analyzing Applications; an Empirical Study