Distributed file system for cloud: Difference between revisions

Content deleted Content added
m Reverted edit by 154.133.103.135 (talk) to last version by JCW-CleanerBot
Tags: Rollback Mobile edit Mobile web edit
 
(579 intermediate revisions by 71 users not shown)
Line 1:
{{Short description|File system that allows many clients to have access}}
{{Orphan|date=December 2013}}
A '''distributed file system for cloud''' is a [[w:file system|file system]] that allows many clients to have access to data and supports operations (create, delete, modify, read, write) on that data. Each data file may be partitioned into several parts called [[Chunk (information)|chunks]]. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a [[Hierarchical tree structure|hierarchical tree]], where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. [[w:Confidentiality|Confidentiality]], [[w:Availability|availability]] and [[w:Integrity|integrity]] are the main keys for a secure system.
 
Users can share computing resources through the [[Internet]] thanks to [[cloud computing]] which is typically characterized by [[w:Scalability|scalable]] and [[w:Elasticity (cloud computing)|elastic]] resources – such as physical [[w:Server (computing)|servers]], applications and any services that are [[w:Virtualization|virtualized]] and allocated dynamically. [[w:Synchronization|Synchronization]] is required to make sure that all devices are up-to-date.
'''Distributed file system in cloud ''' is a file system that allows many clients to have access to the same data/file providing important operations (create, delete, modify, read, write). Each file may be partitioned into several parts called chunks. Each chunk is stored in remote machines.Typically, data is stored in files in a hierarchical tree where the nodes represent the directories. Hence, it facilitates the parallel execution of applications. There are several ways to share files in a distributed architecture. Each solution must be suitable for a certain type of application relying on how complex is the application or how simple it is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.
Nowadays, users can share resources from any computer/device, anywhere and everywhere through internet thanks to cloud computing which is typically characterized by the scalable and elastic resources -such as physical servers, applications and any services- that are virtualized and allocated dynamically. Thus, synchronization is required to make sure that all devices are update.
Distributed file systems enable also many big, medium and small enterprises to store and access their remote data exactly as they do locally, facilitating the use of variable resources.
 
Distributed file systems enable many big, medium, and small enterprises to store and access their remote data as they do local data, facilitating the use of variable resources.
==Overview==
 
===History= Overview ==
=== History ===
Today, there are many implementations of distributed file systems.
Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s. Sun Microsystem's [[Network File System]] became available in the 1980s. Before that, people who wanted to share files used the [[sneakernet]] method, physically transporting files on storage media from place to place. Once computer networks started to proliferate, it became obvious that the existing file systems had many limitations and were unsuitable for multi-user environments. Users initially used [[FTP]] to share files.<ref>{{harvnb|Sun microsystem|p=1}}</ref> FTP first ran on the [[PDP-10]] at the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and then from the server onto the destination computer. Users were required to know the physical addresses of all computers involved with the file sharing.<ref>{{harvnb|Kon|1996|p=1}}</ref>
The first file servers were developed by researchers in the 1970s, and the Sun's Network File System were disposable in the early 1980.
Before that, people who wanted to share files used the [[sneakernet]] method. Once the computer networks start to progress, it became obvious that the existing file systems had a lot of limitations and were unsuitable for multi-user environments. At the beginning, many users started to use [[FTP]] to share files.<ref>{{harvnb|S.microsystem |p=1|id=sun}}</ref> It started running on the [[DPD-10]] in the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and also from the server onto the destination computer. And that force the users to know the physical addresses of all computers concerned by the file sharing.<ref>{{harvnb| Fabio Kon |p=1 |id= Fabio}}</ref>
 
=== Supporting techniques ===
CloudModern computingdata usecenters importantmust techniquessupport tolarge, enforceheterogenous theenvironments, performanceconsisting of alllarge thenumbers system.of Moderncomputers Dataof centersvarying providecapacities. aCloud hugecomputing environmentcoordinates withthe dataoperation centerof networkingall such (DCN)systems, andwith consistingtechniques ofsuch bigas number[[Data ofcenter computersnetwork characterizedarchitectures|data bycenter differentnetworking]] capacity(DCN), of storage.the [[w:MapReduce|MapReduce]] framework, hadwhich shown its performance withsupports [[w:Data-intensive computing|Datadata-intensive computing]] applications in a parallel and distributed system. Moreoversystems, and [[virtualization]] techniquetechniques has been employed tothat provide dynamic resource allocation and, allowing multiple operating systems to coexist on the same physical server.
 
=== Applications ===
As cloud[[Cloud computing]] provides a large-scale computing thanks to its ability of providing to the userprovide the needfulneeded CPU and storage resources withto athe user with complete transparency,. itThis makes itcloud verycomputing suitableparticularly suited to support different types of applications that require a large-scale distributed processing. That kind ofThis [[w:Data-intensive computing|Datadata-intensive computing]] needs a high performance [[file system]] that can share data between VMs ([[w:Virtualvirtual machines|Virtual machine]] (VM).<ref>{{harvnb| K. Kobayashi, S.| Mikami,| H.Kimura, O.|Tatebe |2011|p=1|id= Kobayashi}}</ref>
 
Cloud computing dynamically allocates the needed resources, releasing them once a task is finished, requiring users to pay only for needed services, often via a [[service-level agreement]]. Cloud computing and [[Computer cluster|cluster computing]] paradigms are becoming increasingly important to industrial data processing and scientific applications such as [[astronomy]] and physics, which frequently require the availability of large numbers of computers to carry out experiments.<ref>{{harvnb|Angabini|Yazdani|Mundt|Hassani |2011|p=1}}</ref>
The application of the Cloud Computing and Cluster Computing paradigms are becoming increasingly important in the industrial data processing and scientific applications such as astronomy or physic ones that frequently demand a the availability of a huge number on computers in order to lead the required experiments. The cloud computing have represent a new way of using the computing infrastructure by dynamically allocating the needed resources, release them once it's finished and only pay for what they use instead of paying some resources, for a certain time fixed earlier(the pas-as-you-go model). That kind of services is often provide in the context of [[w:SLA|Service-level agreement]].<ref>{{harvnb|Angabini A, Yazdani N., Mundt T, Hassani F. |p=1|id= Angabini}}</ref>
 
== Architectures ==
Most of distributed file systems are built on the client-server architecture, but yet othersother, decentralized, solutions exist as well.
 
=== Client-server architecture ===
[[Network File System]] (NFS) uses a [[client-server architecture]], which allows sharing of files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual ___location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently.<ref>{{harvnb|Di Sano| Di Stefano|Morana|Zito|2012|p=2}}</ref> The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model:
* Remote access model: Provides transparency, the client has access to a file. He sends requests to the remote file (while the file remains on the server).<ref>{{harvnb|Andrew|Maarten|2006|p=492}}</ref>
* Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients.
 
The file system used by NFS is almost the same as the one used by [[Unix]] systems. Files are hierarchically organized into a naming graph in which directories and files are represented by nodes.
NFS (network file system) is the one of the most that use this architecture. NFS enable to share files between a certain number of machines on a network as if they were located locally. It provides a standardized view of the local file system. The NFS protocol allows heterogeneous clients (process), probably running on different operating systems and machines, to access the files on a distant server, ignoring the actual ___location of files.
However, relying on a single server makes the NFS protocol suffering form a low availability and a poor scalability. Using multiple servers does not solve the problem since each server is working independently.<ref>{{harvnb|M. Di Sano, A. Di Stefano, G. Morana, D.Zito|2012|p=2|id= Di Sano}}</ref>
The model of NFS is the remote file service. This model is also called the remote access model which is in contrast with the upload/download model:
* remote access model: provides the transparency , the client has access to a file . He can do requests to the remote file(the file remains on the server) <ref>{{harvnb|Andrew S.Tanenbaum, Maarten van Steen|p=492|id= Tanenbaum}}</ref>
* upload/download model: the client can access the file only locally. It means that he has to download the file , make the modification and uploaded it again so it can be used by others clients.
The file system offered by NFS is almost the same as the one offered by UNIX systems. Files are hierarchically organized into a naming graph in which directories and files are represented by nodes.
 
=== Cluster-Basedbased architectures ===
A [[Clustered file system|cluster-based architecture]] ameliorates some of the issues in client-server architectures, improving the execution of applications in parallel. The technique used here is file-striping: a file is split into multiple chunks, which are "striped" across several storage servers. The goal is to allow access to different parts of a file in parallel. If the application does not benefit from this technique, then it would be more convenient to store different files on different servers. However, when it comes to organizing a distributed file system for large data centers, such as Amazon and Google, that offer services to web clients allowing multiple operations (reading, updating, deleting,...) to a large number of files distributed among a large number of computers, then cluster-based solutions become more beneficial. Note that having a large number of computers may mean more hardware failures.<ref>{{harvnb|Andrew |Maarten |2006|p=496}}</ref> Two of the most widely used distributed file systems (DFS) of this type are the [[Google File System]] (GFS) and the [[Apache Hadoop|Hadoop Distributed File System]] (HDFS). The file systems of both are implemented by user level processes running on top of a standard operating system ([[Linux]] in the case of GFS).<ref>{{harvnb|Humbetov|2012|p=2}}</ref>
 
==== Design principles ====
It's rather an amelioration of client-server architecture in a way that improve the execution of parallel application. The technique used here is the file-striping one. This technique lead to split a file into several segments in order to save them in multiple servers. The goal is to have access to different parts of a file in parallel.
===== Goals =====
If the application does not benefit from this technique, then it could be more convenient to just store different files on different servers.
[[Google File System]] (GFS) and [[Hadoop Distributed File System]] (HDFS) are specifically built for handling [[batch processing]] on very large data sets.
However, when it comes to organize a distributed file system for large data centers such as Amazon and Google that offer services to web clients allowing multiple operations (reading, updating, deleting,...) to a huge amount of files distributed among a massive number of computers, then it becomes more interesting. Note that a massive number of computers opens the door for more hardware failures because more server machines mean more hardware and thus high probability of hardware failures.<ref>{{harvnb|Andrew S.Tanenbaum, Maarten van Steen|p=496|id= Tanenbaum}}</ref> Two of the most widely used DFS are the Google file system and the Hadoop distributed file system. In both systems, the file system is implemented by user level processes running on top of a standard operating system (in the case of GFS, [[w:Linux|Linux]]).<ref>{{harvnb|Humbetov|p=2|id= Humbetov}}</ref>
For that, the following hypotheses must be taken into account:<ref name="Krzyzanowski_p2" />
* High availability: the [[Computer cluster|cluster]] can contain thousands of file servers and some of them can be down at any time
* A server belongs to a rack, a room, a data center, a country, and a continent, in order to precisely identify its geographical ___location
* The size of a file can vary from many gigabytes to many terabytes. The file system should be able to support a massive number of files
* The need to support append operations and allow file contents to be visible even while a file is being written
* Communication is reliable among working machines: [[Transmission Control Protocol|TCP/IP]] is used with a [[Remote procedure call|remote procedure call RPC]] communication abstraction. TCP allows the client to know almost immediately when there is a problem and a need to make a new connection.<ref>{{harvnb|Pavel Bžoch |p=7}}</ref>
 
====design= principlesLoad balancing =====
[[Load balancing (computing)|Load balancing]] is essential for efficient operation in distributed environments. It means distributing work among different servers,<ref>{{harvnb|Kai|Dayang|Hui|Yintang|2013|p=23}}</ref> fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.<ref name="ReferenceA">{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=2}}</ref> In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications.
 
===== Load rebalancing =====
GFS and HDFS are specifically built for handling [[w:batch processing|batch processing]] on very large data sets.
In a cloud computing environment, failure is the norm,<ref>{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=952}}</ref><ref>{{harvnb|Ghemawat|Gobioff|Leung|2003|p=1}}</ref> and chunkservers may be upgraded, replaced, and added to the system. Files can also be dynamically created, deleted, and appended. That leads to load imbalance in a distributed file system, meaning that the file chunks are not distributed equitably between the servers.
For that, the following hypotheses must be taken into account:<ref>{{harvnb|Paul Krzyzanowski|p=2|id= Krzyzanowski}}</ref>
* High availability: The cluster can contain thousands of file servers and some of them can be down at any time
* Servers belong to a rack,a room, a data center, a country and a continent in order to precisely identify its geographical ___location
* The size of file can varied form many gigabytes to many terabytes. The file system should be able to support a massive number of files
* Need to support append operations and allow file contents to be visible even while a file is being written
* Communication is reliable among working machines. TCP/IP is used with an [[w:Remote procedure call|Remote_procedure_call RPC]] communication abstraction
 
Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold.<ref>{{harvnb|Ghemawat|Gobioff|Leung|2003|p=8}}</ref> However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is [[w:NP-hard|NP-hard]].<ref>{{harvnb|Hsiao|Chung|Shen|Chao|2013|p=953}}</ref>
====Examples====
 
In order to get a large number of chunkservers to work in collaboration, and to solve the problem of load balancing in distributed file systems, several approaches have been proposed, such as reallocating file chunks so that the chunks can be distributed as uniformly as possible while reducing the movement cost as much as possible.<ref name="ReferenceA" />
 
=====GFS= Google file system ====
{{main|Google File System}}
 
===== Description =====
Google, one of the biggest internet companies, has created its own distributed file system, named Google File System (GFS), to meet the rapidly growing demands of Google's data processing needs, and it is used for all cloud services. GFS is a scalable distributed file system for data-intensive applications. It provides fault-tolerant, high-performance data storage a large number of clients accessing it simultaneously.
 
GFS uses [[MapReduce]], which allows users to create programs and run them on multiple machines without thinking about parallelization and load-balancing issues. GFS architecture is based on having a single master server for multiple chunkservers and multiple clients.<ref>{{harvnb|Di Sano|Di Stefano|Morana|Zito|2012|pp=1–2}}</ref>
{{Cat main|Google File System}}
Among the biggest internet companies, Google has created its own distributed file system named Google File System (GFS) to meet the rapidly growing requests of Google's data processing needs and it's used for all cloud services.
GFS is a scalable distributed file system for data-intensive applications. It provides a fault-tolerant way to store data and offer a high performance to a large number of clients.
 
The master server running in dedicated node is responsible for coordinating storage resources and managing files's [[metadata]] (the equivalent of, for example, inodes in classical file systems).<ref name="Krzyzanowski_p2">{{harvnb|Krzyzanowski|2012|p=2}}</ref>
GFS uses [[w:MapReduce|MapReduce]] that allows users to create programs and run them on multiple machines without thinking about the parallelization and load-balancing issues .
Each file is split into multiple chunks of 64 megabytes. Each chunk is stored in a chunk server. A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created.
GFS architecture is based on a single master, multiple chunckservers and multiple clients.<ref>{{harvnb|M. Di Sano, A. Di Stefano, G. Morana, D.Zito|pp=1–2|id= Di Sano}}</ref>
 
The master maintains all of the files's metadata, including file names, directories, and the mapping of files to the list of chunks that contain each file's data. The metadata is kept in the master server's main memory, along with the mapping of files to chunks. Updates to this data are logged to an operation log on disk. This operation log is replicated onto remote machines. When the log becomes too large, a checkpoint is made and the main-memory data is stored in a [[B-tree]] structure to facilitate mapping back into the main memory.<ref>{{harvnb|Krzyzanowski|2012|p=4}}</ref>
The master server running on a dedicated node is responsible for coordinating storage resources and managing files's [[w:metadata|metadata]] (such as the equivalent of inodes in classical file systems).<ref>{{harvnb|Paul Krzyzanowski|p=2|id= Krzyzanowski}}</ref>
Each file is splited to multiple chunks of 64 MByte. Each chunk is stored in a chunk server.A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created.
 
===== Fault tolerance =====
As said previously, the master maintain all of the files's metadata including their names, directories and the mapping of files to the list of chunks that contain each file’s data.The metadata is kept in the master main memory, along with the mapping of files to chunks. Updates of these data are logged to the disk onto an operation log. This operation log is also replicated onto remote machines. When the log become too large, a checkpoint is made and the main-memory data is stored in a [[w:B-tree|B-tree]] structure to facilitate the mapped back into main memory.<ref>{{harvnb|Paul Krzyzanowski|p=4|id= Krzyzanowski}}</ref>
To facilitate [[fault tolerance]], each chunk is replicated onto multiple (default, three) chunk servers.<ref>{{harvnb|Di Sano|Di Stefano| Morana|Zito|2012|p=2}}</ref> A chunk is available on at least one chunk server. The advantage of this scheme is simplicity. The master is responsible for allocating the chunk servers for each chunk and is contacted only for metadata information. For all other data, the client has to interact with the chunk servers.
 
The master keeps track of where a chunk is located. However, it does not attempt to maintain the chunk locations precisely but only occasionally contacts the chunk servers to see which chunks they have stored.<ref>{{harvnb|Andrew |Maarten |2006|p=497}}</ref> This allows for scalability, and helps prevent bottlenecks due to increased workload.<ref>{{harvnb|Humbetov|2012|p=3}}</ref>
For fault tolerance, a chunk is replicated onto multiple chunkservers, by default on three chunckservers.<ref>{{harvnb|M. Di Sano, A. Di Stefano, G. Morana, D.Zito|p=2|id= Di Sano}}</ref> A chunk is available on at least a chunk server.
The advantage of this system is the simplicity. The master is responsible of allocating the chunk servers for each chunk and it's contacted only for metadata information. For all other data, the client has to interact with chunkservers.
Moreover, the master keeps track of where a chunk is located. However, it does not attempt to keep precisely the chunk locations but occasionally contact the chunk servers to see which chunks they have stored. [lelivre]
GFS is a scalable distributed file system for data-intensive applications.<ref>{{harvnb|Humbetov|p=3|id= Humbetov}}</ref>
The master does not have a problem of bottleneck due to all the work that has to to accomplish. In fact, when the client want to access a data, it communicates with the master to see which chunk server is holding that data. Once done, the communication is setted up between the client and the concerned chunk server.
 
In GFS, most files are modified by appending new data and not overwriting existing data. In fact, onceOnce written, the files are usually only read and often only sequentially rather than randomly, and that mademakes this DFS the most suitable for scenarios in which many large files are created once but read many times.<ref>{{harvnb|Humbetov|2012|p=25|id= Humbetov5}}</ref><ref>{{harvnb| Andrew S.Tanenbaum ,|Maarten van Steen|2006|p=498|id= Tanenbaum }}</ref>
Now, let's detail the file access process. When a client wants to write/update to a file, the master should accord a replica for this operation. This replica will be the primary replica since it's the first one that gets the modification from clients.
The process of writing is decomposed into two steps:<ref>{{harvnb|Paul Krzyzanowski|p=2|id= Krzyzanowski}}</ref>
 
===== File processing =====
* sending: First, and by far the most important, the client contacts the master to find out which chunk servers holds the data. So the client is given a list of replicas identifying the primary chunk server and secondaries ones. Then, the client contacts the nearest replica chunk server, and send the data to it. This server will send the data to the next closest one, which then forwards it to yet another replica, and so on. After that, the data have been propagated but not yet written to a file (sits in a cache)
When a client wants to write-to/update a file, the master will assign a replica, which will be the primary replica if it is the first modification. The process of writing is composed of two steps:<ref name="Krzyzanowski_p2" />
* Sending: First, and by far the most important, the client contacts the master to find out which chunk servers hold the data. The client is given a list of replicas identifying the primary and secondary chunk servers. The client then contacts the nearest replica chunk server, and sends the data to it. This server will send the data to the next closest one, which then forwards it to yet another replica, and so on. The data is then propagated and cached in memory but not yet written to a file.
* Writing: When all the replicas have received the data, the client sends a write request to the primary chunk server, identifying the data that was sent in the sending phase. The primary server will then assign a sequence number to the write operations that it has received, apply the writes to the file in serial-number order, and forward the write requests in that order to the secondaries. Meanwhile, the master is kept out of the loop.
 
Consequently, we can differentiate two types of flows: the data flow and the control flow. Data flow is associated with the sending phase and control flow is associated to the writing phase. This assures that the primary chunk server takes control of the write order.
* writing: When all the replicas receive the data, the client sends a write request to the primary chunk server -identifying the data that was sent in the sending phase- who will then assign a sequence number to the write operations that it has received, applies the writes to the file in serial-number order, and forwards the write requests in that order to the secondaries. Meanwhile, the master is kept out of the loop.
Note that when the master assigns the write operation to a replica, it increments the chunk version number and informs all of the replicas containing that chunk of the new version number. Chunk version numbers allow for update error-detection, if a replica wasn't updated because its chunk server was down.<ref>{{harvnb|Krzyzanowski|2012|p=5}}</ref>
Consequently, we can differentiate two types of flows: the data flow and the control flow. The first one is associated to the sending phase and the second one is associated to the writing phase. This assures that the primary chunk server takes the control of the writes order.
Note that when the master accord the write operation to a replica, it increments the chunk version number and informs all of the replicas containing that chunk of the new version number. Chunk version numbers allow to see if any replica didn't make the update because that chunkserver was down.<ref>{{harvnb|Paul Krzyzanowski|p=5|id= Krzyzanowski}}</ref>
 
It seems that someSome new Google applications didn'tdid not work well with the 64-megabyte chunk size. To treatsolve that problem, GFS started, in 2004, to implement the [https://en.wikipedia.org/wiki/BigTable BigTable[Bigtable]] approach."[http<ref>{{Cite web | url=https://arstechnica.com/business/2012/01/the-big-disk-drive-in-the-sky-how-the-giants-of-the-web-store-big-data/] | title=The Great Disk Drive in the Sky: How Web giants store big—and we mean big—data| date=2012-01-27}}</ref>
 
==== Hadoop distributed file system ====
=====HDFS=====
{{main|Apache Hadoop}}
 
{{abbr|HDFS |Hadoop Distributed File System}}, developed by the [[Apache Software Foundation]], is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes). Its architecture is similar to GFS, i.e. a server/client architecture. The HDFS is normally installed on a cluster of computers.
{{Cat main|Apache Hadoop}}
The design concept of Hadoop is informed by Google's, with Google File System, Google MapReduce and [[Bigtable]], being implemented by Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Hadoop Base (HBase) respectively.<ref>{{harvnb|Fan-Hsun|Chi-Yuan| Li-Der| Han-Chieh|2012|p=2}}</ref> Like GFS, HDFS is suited for scenarios with write-once-read-many file access, and supports file appends and truncates in lieu of random reads and writes to simplify data coherency issues.<ref>{{Cite web | url=http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Assumptions_and_Goals | title=Apache Hadoop 2.9.2 – HDFS Architecture}}</ref>
HDFS, Hadoop Distributed File System,hosted by Apache Software Foundation, is a distributed file system. It's designed to hold very large amounts of data (terabytes or even petabytes). It's architecture is similar to GFS one, i.e. a master/slave architecture.The HDFS is normally installed on a cluster of computers.
The design concept of Hadoop refers to Google, including Google File System, Google MapReduce and [[w:BigTable|BigTable]]. These three techniques are individually mapping to Hadoop and Distributed File System (HDFS), Hadoop MapReduce Hadoop Base (HBase).<ref>{{harvnb|Fan-Hsun|p=2|id= Fan-Hsun}}</ref>
 
An HDFS cluster consists of a single NameNode and several DatanodeDataNode machines. AThe nameNodeNameNode, a master server, manages and maintains the metadata of storage DataNodes in its RAM. DataNodes manage storage attached to the nodes that they run on. NameNode and DataNode are software designed to run on everyday-use machines, which typically run under a Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software.<ref>{{harvnb|Azzedin|2013|p=2}}</ref>
The NameNode and Datanode are software programs designed to run on everyday use machines.These machines typically run on a GNU/Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software.<ref>{{harvnb|Azzedin|p=2|id= Azzedin}}</ref>
 
MoreOn explicitlyan HDFS cluster, a file is split into one or more equal-size blocks, except for the lastpossibility oneof thatthe couldlast block being smaller. Each block is stored inon multiple DataNodes., Eachand blockeach may be replicated on multiple DataNodes to guarantee a high availability. By default, each block is replicated three times, and thata process is called "Block Level Replication".<ref name="admaov_2">{{harvnb|Abzetdin Adamov|2012|p=2|id= Adamov}}</ref>
 
The NameNode managemanages the file system namespace operations likesuch as opening, closing, and renaming files and directories, and regulates the file access. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for operatingservicing read and write requests from the file system’ssystem's clients, managing the block allocation or deletion, and replicating blocks.<ref>{{harvnb|Yee|Thu Naing|2011|p=122}}</ref>
 
When a client wants to read or write data, it contacts the NameNode and the NameNode checks where the data should be read from or written to. After that, the client has the ___location of the DataNode and can send read or write requests to it.
After that, the client has the ___location of the dataNode and can send reads/writes request to it.
 
The HDFS is typically characterized by its compatibility with data rebalancing schemes. The problem of data rebalancing will be developed in a ulterior section. In general, managing the free space on a DataNode is very important. Data must be moved formfrom aone DataNode to another one, if its free space is not satisfying.adequate; And also,and in the case of creating additional replicas, data should movebe moved to assure thesystem balance.<ref ofname="admaov_2" the system./>
 
=====Load balancingOther andexamples rebalancing=====
Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.<ref>{{harvnb|Soares| Dantas†|de Macedo|Bauer|2013|p=158}}</ref> Some examples include: [[MapR FS|MapR File System]] (MapR-FS), [[Ceph (storage)|Ceph-FS]], [[BeeGFS|Fraunhofer File System (BeeGFS)]], [[Lustre (file system)|Lustre File System]], [[IBM General Parallel File System]] (GPFS), and [[Parallel Virtual File System]].
[[File:balancing and rebalancing.pdf|thumb|right|190px|load balancing and rebalancing|page=1]]
 
MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.<ref name="mapr-productivity">{{cite web|last1=Perez|first1=Nicolas|title=How MapR improves our productivity and simplifies our design|url=https://medium.com/@anicolaspp/how-mapr-improves-our-productivity-and-simplify-our-design-2d777ab53120#.mvr6mmydr|website=Medium|access-date=June 21, 2016|date=2016-01-02}}</ref><ref>{{cite web|last1=Woodie|first1=Alex|title=From Hadoop to Zeta: Inside MapR's Convergence Conversion|url=http://www.datanami.com/2016/03/08/from-hadoop-to-zeta-inside-maprs-convergence-conversion/|website=Datanami|publisher=Tabor Communications Inc.|access-date=June 21, 2016|date=2016-03-08}}</ref><ref>{{cite web|last1=Brennan|first1=Bob|title=Flash Memory Summit|url=https://www.youtube.com/watch?v=fOT63zR7PvU&t=1682|website=youtube|publisher=Samsung|access-date=June 21, 2016}}</ref><ref name="maprfs-video">{{cite web|last1=Srivas|first1=MC|title=MapR File System|url=https://www.youtube.com/watch?v=fP4HnvZmpZI|website=Hadoop Summit 2011|date=23 July 2011 |publisher=Hortonworks|access-date=June 21, 2016}}</ref><ref name="real-world-hadoop">{{cite book|last1=Dunning|first1=Ted|last2=Friedman|first2=Ellen|title=Real World Hadoop|date=January 2015|publisher=O'Reilly Media, Inc|___location=Sebastopol, CA|isbn=978-1-4919-2395-5|pages=23–28|edition=First|chapter-url=http://shop.oreilly.com/product/0636920038450.do|access-date=June 21, 2016|language=en|chapter=Chapter 3: Understanding the MapR Distribution for Apache Hadoop}}</ref>
======Load balancing======
 
Ceph-FS is a distributed file system that provides excellent performance and reliability.<ref>{{harvnb|Weil|Brandt|Miller|Long|2006|p=307}}</ref> It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.<ref>{{harvnb|Maltzahn|Molina-Estolano|Khurana|Nelson|2010|p=39}}</ref>
Load Balancing is essential for efficient operations in distributed environments. It means distributing the amount of work to do between different nodes in order to get more work done in the same amount of time and clients get served faster.
In our case, consider a large-scale distributed file system. The system contains N chunkservers in a cloud (N can be 1000, 10000, or more) and where a certain number of files are stored. Each file is splitted into several parts/chunks of fixed- size( for example 64 MBytes). The load of a each chunkserver is proportional to the number of chunks hosted by the server.<ref>{{harvnb|Hung-Chang Hsiao, Haiying Shen, Hsueh-Yi Chung, Yu-Chang Chao|p=2|id=Hsiao}}</ref>
In a load balanced cloud, the resources can be well used while maximizing the performance of MapReduce- based applications.
 
BeeGFS is the high-performance parallel file system from the Fraunhofer Competence Centre for High Performance Computing. The distributed metadata architecture of BeeGFS has been designed to provide the scalability and flexibility needed to run [[High performance computing|HPC]] and similar applications with high I/O demands.<ref>{{harvnb|Jacobi|Lingemann|p=10}}</ref>
======Load rebalancing======
 
Lustre File System has been designed and implemented to deal with the issue of bottlenecks traditionally found in distributed systems. Lustre is characterized by its efficiency, scalability, and redundancy.<ref>{{harvnb|Schwan Philip|2003 |p=401}}</ref> GPFS was also designed with the goal of removing such bottlenecks.<ref>{{harvnb|Jones|Koniges|Yates|2000 |p=1}}</ref>
In a cloud computing environment, failure is the norm, and chunkservers may be upgraded, replaced, and added in the system. In addition, files can also be dynamically created, deleted, and appended. An that lead to load imbalance in a distributed file system. It means that the file chunks are not distributed equitably between the nodes.
 
== Communication ==
Distributed file systems in clouds such as GFS and HDFS, rely on central servers (master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved form a DataNode/chumkserver to another one if its free space is below a certain threshold.<ref>{{harvnb|Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung |p=8|id=Ghemawat}}</ref>
High performance of distributed file systems requires efficient communication between computing nodes and fast access to the storage systems. Operations such as open, close, read, write, send, and receive need to be fast, to ensure that performance. For example, each read or write request accesses disk storage, which introduces seek, rotational, and network latencies.<ref>{{harvnb|Upadhyaya|Azimov|Doan|Choi|2008|p=400}}</ref>
However, this centralized approach can provoke a bottleneck for those servers as they become unable to manage a large number of file accesses. Consequently, dealing with the load imbalance problem with the central nodes complicate more the situation as it increases their heavy loads. Note that the load rebalance problem is [[w:NP-hard|NP-hard]].<ref>{{harvnb|Hung-Chang Hsiao, Haiying Shen, Hsueh-Yi Chung, Yu-Chang Chao|p=3|id=Hsiao}}</ref>
 
The data communication (send/receive) operations transfer data from the application buffer to the machine kernel, [[Transmission Control Protocol|TCP]] controlling the process and being implemented in the kernel. However, in case of network congestion or errors, TCP may not send the data directly. While transferring data from a buffer in the [[kernel (operating system)|kernel]] to the application, the machine does not read the byte stream from the remote machine. In fact, TCP is responsible for buffering the data for the application.<ref>{{harvnb|Upadhyaya|Azimov|Doan|Choi|2008|p=403}}</ref>
In order to manage large number of chunkservers to work in collaboration, and solve the problem of load balancing in distributed file systems, there are several approaches that have been proposed such as reallocating file chunks such that the chunks can be distributed to the system as uniformly as possible while reducing the movement cost as much as possible.<ref>{{harvnb|Hung-Chang Hsiao, Haiying Shen, Hsueh-Yi Chung, Yu-Chang Chao|id=Hsiao}}</ref>
 
Choosing the buffer-size, for file reading and writing, or file sending and receiving, is done at the application level. The buffer is maintained using a [[Linked list|circular linked list]].<ref>{{harvnb|Upadhyaya|Azimov|Doan|Choi|2008|p=401}}</ref> It consists of a set of BufferNodes. Each BufferNode has a DataField. The DataField contains the data and a pointer called NextBufferNode that points to the next BufferNode. To find the current position, two [[Pointer (computer programming)|pointers]] are used: CurrentBufferNode and EndBufferNode, that represent the position in the BufferNode for the last write and read positions.
==Communication==
If the BufferNode has no free space, it will send a wait signal to the client to wait until there is available space.<ref>{{harvnb|Upadhyaya|Azimov|Doan|Choi|2008|p=402}}</ref>
 
== Cloud-based Synchronization of Distributed File System ==
The high performance of distributed file systems require an efficient communication between computing nodes and a fast access to the storage system. Operations as open, close, read, write, send and receive should be fast to assure that performance. Note that for each read or write request, the remote disk is accessed and that may takes a long time due to the network latencies.
More and more users have multiple devices with ad hoc connectivity. The data sets replicated on these devices need to be synchronized among an arbitrary number of servers. This is useful for backups and also for offline operation. Indeed, when user network conditions are not good, then the user device will selectively replicate a part of data that will be modified later and off-line. Once the network conditions become good, the device is synchronized.<ref name="Uppoor">{{harvnb|Uppoor|Flouris|Bilas|2010|p=1}}</ref> Two approaches exist to tackle the distributed synchronization issue: user-controlled peer-to-peer synchronization and cloud master-replica synchronization.<ref name="Uppoor" />
Several works have been done in order to improve communications between cluster nodes at the application level and also the storage access to avoid that a remote node is [[w:bottleneck|Bottleneck]] while reading and storing data.
* user-controlled peer-to-peer: software such as [[rsync]] must be installed in all users' computers that contain their data. The files are synchronized by peer-to-peer synchronization where users must specify network addresses and synchronization parameters, and is thus a manual process.
* cloud master-replica synchronization: widely used by cloud services, in which a master replica is maintained in the cloud, and all updates and synchronization operations are to this master copy, offering a high level of availability and reliability in case of failures.
 
== Security keys ==
The data communication (send/receive) operation transfer the data from the application buffer to the kernel on the machine.[[w:Transmission Control Protocol|TCP]] control the process of sending data and is implemented in the kernel. However, in case of network congestion or errors, TCP may not send the data directly.
While transferring, data from a buffer in the [[w:Kernel (computing)|kernel]] to the application, the machine does not read the byte stream from the remote machine. In fact, TCP is responsible for buffering the data for the application.<ref>{{harvnb|B. Upadhyaya, Azimov, F., Doan T.T., Eunmi Choi, SangBum Kim, Pilsung Kim|p=4|id= Upadhyaya}}</ref>
 
In cloud computing, the most important [[security]] concepts are [[#Confidentiality|confidentiality]], [[#Integrity|integrity]], and [[#Availability|availability]] ("[[Information security|CIA]]"). Confidentiality becomes indispensable in order to keep private data from being disclosed. Integrity ensures that data is not corrupted.<ref name="Zhifeng 2013 854">{{harvnb|Zhifeng |Yang|2013|p=854}}</ref>
Providing a high level of communication can be done by choosing the buffer-size of file reading and writing or file sending and receiving on application level.
Explicitly, the buffer mechanism is developed using [[w:Linked list|Circular Linked List]].<ref>{{harvnb|B. Upadhyaya, Azimov, F., Doan T.T., Eunmi Choi, SangBum Kim, Pilsung Kim|p=2|id= Upadhyaya}}</ref> It consists of a set of BufferNodes. Each BufferNode has a DataField. The DataField contains the data and a pointer called NextBufferNode that points to the next BufferNode. To find out the current position, two [[w:Pointer (computer programming)|pointers]] are used: CurrentBufferNode and EndBufferNode , that represent the position in the BufferNode for the last written poistion and last read one.
If the BufferNode has no free space, it will send a wait signal to the client to tell him to wait until there is available space.<ref>{{harvnb|B. Upadhyaya, Azimov, F., Doan T.T., Eunmi Choi, SangBum Kim, Pilsung Kim|p=3|id= Upadhyaya}}</ref>
 
=== Confidentiality ===
==Security keys==
 
[[Confidentiality]] means that data and computation tasks are confidential: neither cloud provider nor other clients can access the client's data. Much research has been done about confidentiality, because it is one of the crucial points that still presents challenges for cloud computing. A lack of trust in the cloud providers is also a related issue.<ref>{{harvnb|Zhifeng |Yang|2013|pp=845–846}}</ref> The infrastructure of the cloud must ensure that customers' data will not be accessed by unauthorized parties.
In cloud computing, the most important security concepts are confidentiality, availability and integrity.
In fact, confidentiality becomes indispensable in order to keep private data from being disclosed and maintain privacy. In addition, integrity assures that data is not corrupted [Security and Privacy in Cloud Computing].
 
The environment becomes insecure if the service provider can do all of the following:<ref>{{harvnb|Yau|An|2010|p=353}}</ref>
===Confidentiality===
* locate the consumer's data in the cloud
* access and retrieve consumer's data
* understand the meaning of the data (types of data, functionalities and interfaces of the application and format of the data).
 
The geographic ___location of data helps determine privacy and confidentiality. The ___location of clients should be taken into account. For example, clients in Europe won't be interested in using datacenters located in United States, because that affects the guarantee of the confidentiality of data. In order to deal with that problem, some cloud computing vendors have included the geographic ___location of the host as a parameter of the service-level agreement made with the customer,<ref>{{harvnb|Vecchiola|Pandey|Buyya|2009|p=14}}</ref> allowing users to choose themselves the locations of the servers that will host their data.
Confidentiality means that data and computation tasks are confidential: neither the cloud provider nor others clients could access to data.
Many researches have been made about confidentiality because it's one of the crucial points that still represent challenges for cloud computing. The lack of trust toward the cloud providers is also a related issue.<ref>{{harvnb|Xiao Zhifeng|pp=3–4|id= Zhifeng}}</ref> So the infrastructure of the cloud must make assurance that all consumer's data will not be accessed by any an unauthorized persons.
The risk of an unsecured environment is realized if the service provider can locate consumer's data in the cloud, has the privilege to access and retrieve consumer's data and can understand the meaning of data (types of data, functionalities and interfaces of the application and format of the data).<ref>{{harvnb|Stephen S. Yau, Ho G. An|p=353|id= Stephen}}</ref> If these three conditions are satisfied simultaneously, then it became very dangerous.
The geographic ___location of data stores influences on the privacy and confidentiality. Furthermore, the ___location of clients should be taken into account. Indeed, clients in Europe won't be interested by using datacenters located in United States, because that affects the confidentiality of data as it will not be guaranteed. In order to figure out that problem, some Cloud computing vendors have included the geographic ___location of the hosting as a parameter of the service level agreement made with the customer <ref>{{harvnb|Christian Vecchiola, Suraj Pandey,Rajkumar Buyya|p=14|id= Vecchiola }}</ref> allowing users to chose by themselves the locations of the servers that will host their data.
 
AnAnother approach that may help to face the confidentiality matter is theinvolves data encryption .<ref>{{harvnb|Stephen S. Yau, Ho G. |An |2010|p=2|id= Stephen352}}</ref> otherwiseOtherwise, there will be some serious risksrisk of unauthorized usesuse. InA thevariety same context, otherof solutions exists, such as encrypting only sensitive data.,<ref>{{harvnb|Miranda Mowbray ,|Siani Pearson|id= Miranda2009}}</ref> and supporting only some operations, in order to simplify computation.<ref>{{harvnb|Naehrig Michael, |Lauter Kristin|id= Michael2013}}</ref> Furthermore, Cryptographiccryptographic techniques and tools as [[w:Homomorphic encryption|FHE]], are also used to strengthenpreserve privacy preserving in the cloud.<ref>{{harvnb| name="Zhifeng Xiao2013 and854" Yang Xiao|p=854|id=Zhifeng}}</ref>
 
===Availability Integrity ===
 
Integrity in cloud computing implies [[data integrity]] as well as [[computing integrity]]. Such integrity means that data has to be stored correctly on cloud servers and, in case of failures or incorrect computing, that problems have to be detected.
 
Data integrity can be affected by malicious events or from administration errors (e.g. during [[backup]] and restore, [[data migration]], or changing memberships in [[Peer-to-peer|P2P]] systems).<ref>{{harvnb|Zhifeng|Yang|2013|p=5}}</ref>
Availability is generally treated by [[w:replication|replication]]. Meanwhile, [[w:consistency|consistency]] must to be guaranteed.
However, consistency and availability cannot be achieved at the same time. This means that neither releasing consistency will allow the system to remain available nor making consistency a priority and letting the system sometimes unavailable.<ref>{{harvnb|Vogels Werner|p=2|id= Vogels}}</ref>
In other hand, data has an identity which is a key that is produced by a one-way cryptographic hash function (e.g. [[w:MD5|MD5]]). Its ___location is the hash function of this key. The key space is partitioned into multiple partitions.<ref>{{harvnb|Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer|p=206|id= Bonvin}}</ref>
To maximize data availability data durability, the replicas are placed in different servers (geographically different) because the data availability increase with the geographical diversity.
The process of replication consists of an evaluation of the data availability that must be above a certain minimum. Otherwise, data are replicated to another chunk server. Each partition i has an availability value represented by the following formula:
 
Integrity is easy to achieve using cryptography (typically through [[message-authentication code]], or MACs, on data blocks).<ref>{{harvnb|Juels|Oprea|2013|p=4}}</ref>
<math>avail_i=\sum_{i=0}^{|s_i|}\sum_{j=i+1}^{|s_i|} conf_i.conf_j.diversity(s_i,s_j)</math>
 
There exist checking mechanisms that effect data integrity. For instance:
where s_i are the servers hosting the replicas, conf_i and conf_j are the confidence of servers i and j (relying on technical factors such as hardware components and non technical ones like the economical and political situation of a country) and the diversity is the geographical distance between s_i and s_j.<ref>{{harvnb|Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer|p=208|id= Bonvin}}</ref>
* HAIL (High-Availability and Integrity Layer) is a distributed cryptographic system that allows a set of servers to prove to a client that a stored file is intact and retrievable.<ref>{{harvnb|Bowers |Juels |Oprea|2009 }}</ref>
* Hach PORs (proofs of [[retrievability]] for large files)<ref>{{harvnb|Juels |S. Kaliski |2007|p=2 }}</ref> is based on a symmetric cryptographic system, where there is only one verification key that must be stored in a file to improve its integrity. This method serves to encrypt a file F and then generate a random string named "sentinel" that must be added at the end of the encrypted file. The server cannot locate the sentinel, which is impossible differentiate from other blocks, so a small change would indicate whether the file has been changed or not.
* PDP (provable data possession) checking is a class of efficient and practical methods that provide an efficient way to check data integrity on untrusted servers:
** PDP:<ref>{{harvnb|Ateniese |Burns |Curtmola|Herring|Kissner|Peterson|Song|2007}}</ref> Before storing the data on a server, the client must store, locally, some meta-data. At a later time, and without downloading data, the client is able to ask the server to check that the data has not been falsified. This approach is used for static data.
** Scalable PDP:<ref>{{harvnb|Ateniese |Di Pietro |V. Mancini|Tsudik|2008 |pp=5, 9}}</ref> This approach is premised upon a symmetric-key, which is more efficient than public-key encryption. It supports some dynamic operations (modification, deletion, and append) but it cannot be used for public verification.
** Dynamic PDP:<ref>{{harvnb|Erway |Küpçü |Tamassia|Papamanthou|2009|p=2}}</ref> This approach extends the PDP model to support several update operations such as append, insert, modify, and delete, which is well suited for intensive computation.
 
===integrity Availability ===
[[Availability]] is generally effected by [[Replication (computing)|replication]].<ref name="availability">{{harvnb|Bonvin|Papaioannou|Aberer|2009|p=206}}</ref><ref>{{harvnb|Cuong|Cao|Kalbarczyk|Iyer|2012|p=5}}</ref>
<ref>{{harvnb|A.| A.|P.|2011|p=3}}</ref><ref>{{harvnb|Qian |D.|T.|2011|p=3}}</ref> Meanwhile, consistency must be guaranteed. However, consistency and availability cannot be achieved at the same time; each is prioritized at some sacrifice of the other. A balance must be struck.<ref>{{harvnb|Vogels|2009|p=2}}</ref>
 
Data must have an identity to be accessible. For instance, Skute <ref name="availability" /> is a mechanism based on key/value storage that allows dynamic data allocation in an efficient way. Each server must be identified by a label in the form continent-country-datacenter-room-rack-server. The server can reference multiple virtual nodes, with each node having a selection of data (or multiple partitions of multiple data). Each piece of data is identified by a key space which is generated by a one-way cryptographic hash function (e.g. [[w:MD5|MD5]]) and is localised by the hash function value of this key. The key space may be partitioned into multiple partitions with each partition referring to a piece of data. To perform replication, virtual nodes must be replicated and referenced by other servers. To maximize data durability and data availability, the replicas must be placed on different servers and every server should be in a different geographical ___location, because data availability increases with geographical diversity. The process of replication includes an evaluation of space availability, which must be above a certain minimum thresh-hold on each chunk server. Otherwise, data are replicated to another chunk server. Each partition, i, has an availability value represented by the following formula:
Integrity in cloud computing implies data integrity and meanwhile computing integrity. Integrity means data has to be stored correctly on cloud servers and in case of failures or incorrect computing, problems have to be detected.
 
<math>avail_i=\sum_{i=0}^{|s_i|}\sum_{j=i+1}^{|s_i|} conf_i.conf_j.diversity(s_i,s_j)</math>
Data integrity is easy to achieve thanks to cryptography (typically through [[w:Message-Authentication Codes|Message authentication code]], or MACs, on data blocks).<ref>{{harvnb|Ari Juels|p=4|id= Ari Juels}}</ref>
 
where <math> s_{i} </math> are the servers hosting the replicas, <math> conf_{i} </math> and <math> conf_{j} </math> are the confidence of servers <math> _{i} </math> and <math> _{j} </math> (relying on technical factors such as hardware components and non-technical ones like the economic and political situation of a country) and the diversity is the geographical distance between<math> s_{i} </math> and <math> s_{j} </math>.<ref>{{harvnb|Bonvin|Papaioannou|Aberer|2009|p=208}}</ref>
There are different ways affecting data's integrity either from a malicious event or from administration errors (i.e [[w:Backup|backup]] and restore, data migration, changing memberships in [[w:Peer-to-peer|P2P]] systems).<ref>{{harvnb|Zhifeng Xia|p=5|id= Zhifeng}}</ref>
It exists some checking mechanisms that check data integrity. For instance, HAIL (HAIL (High-Availability and Integrity Layer) a distributed cryptographic system that allows a set of servers to prove to a client that a stored file is intact and retrievable.<ref>{{harvnb|Kevin D. Bowers , Ari Juels ,Alina Oprea |id= HAIL}}</ref>
 
Replication is a great solution to ensure data availability, but it costs too much in terms of memory space.<ref name="ReferenceB">{{harvnb|Carnegie|Tantisiriroj|Xiao|Gibson|2009|p=1}}</ref> DiskReduce<ref name="ReferenceB" /> is a modified version of HDFS that's based on [[w:RAID|RAID]] technology (RAID-5 and RAID-6) and allows asynchronous encoding of replicated data. Indeed, there is a background process which looks for widely replicated data and deletes extra copies after encoding it. Another approach is to replace replication with erasure coding.<ref name="ReferenceC">{{harvnb|Wang|Gong|P.|Xie|2012|p=1}}</ref> In addition, to ensure data availability there are many approaches that allow for data recovery. In fact, data must be coded, and if it is lost, it can be recovered from fragments which were constructed during the coding phase.<ref>{{harvnb|Abu-Libdeh|Princehouse|Weatherspoon|2010|p=2}}</ref> Some other approaches that apply different mechanisms to guarantee availability are: Reed-Solomon code of Microsoft Azure and RaidNode for HDFS. Also Google is still working on a new approach based on an erasure-coding mechanism.<ref>{{harvnb|Wang|Gong|P.|Xie|2012|p=9}}</ref>
==Cloud-based Synchronization of Distributed File System==
 
There is no RAID implementation for cloud storage.<ref name="ReferenceC" />
More and more users have multiple devices with ad hoc connectivity. These devices need to be synchronized. In fact, an important point is to maintain user data by synchronizing replicated data sets between an arbitrary number of servers. This is useful for the backups and also for offline operation. Indeed, when the user network conditions are not good, then the user device will selectively replicate a part of data that will be modified later and off-line. Once the network conditions become good, it makes the synchronization.<ref>{{harvnb|Sandesh Uppoor, Michail D. Flouris, and Angelos Bilas|p=1|id=Uppoor}}</ref>
Two approaches exists to tackle with the distributed synchronization issue: the user-controlled peer-to-peer synchronization and the cloud master-replica synchronization approach.<ref>{{harvnb|Sandesh Uppoor, Michail D. Flouris, and Angelos Bilas|p=1|id=Uppoor}}</ref>
 
== Economic aspects ==
* user-controlled peer-to-peer: software such as [[w:rsync|rsync]] must be installed in all users computers that contain their data. The files are synchronized by a peer-to-peer synchronization in a way that users has to give all the network addresses of the devices and the synchronization parameters and thus made a manual process.
 
The cloud computing economy is growing rapidly. The US government has decided to spend 40% of its [[compound annual growth rate]] (CAGR), expected to be 7 billion dollars by 2015.<ref>{{harvnb|Lori M. Kaufman|2009|p=2}}</ref>
*cloud master-replica synchronization: widely used by cloud services in which a master replica that contains all data to be synchronized is retained as a central copy in the cloud, and all the updates and synchronization operations are pushed to this central copy offering a high level of availability and reliability in case of failures.
 
More and more companies have been utilizing cloud computing to manage the massive amount of data and to overcome the lack of storage capacity, and because it enables them to use such resources as a service, ensuring that their computing needs will be met without having to invest in infrastructure (Pay-as-you-go model).<ref>{{harvnb|Angabini|Yazdani|Mundt|Hassani|2011|p=1}}</ref>
==Economic aspects==
 
Every application provider has to periodically pay the cost of each server where replicas of data are stored. The cost of a server is determined by the quality of the hardware, the storage capacities, and its query-processing and communication overhead.<ref>{{harvnb|Bonvin|Papaioannou|Aberer|2009|p=3}}</ref> Cloud computing allows providers to scale their services according to client demands.
The cloud computing is growing rapidly. The US government decided to spend 40% of annual growth rate [[w:CAGR|CAGR]] and fixed 7 billion dollars by 2015. Huge number that should be take into consideration.<ref>{{harvnb|John Harauz, Lori M. Kaufman, Bruce Potter|p=2|id=Kaufman}}</ref>
 
The pay-as-you-go model has also eased the burden on startup companies that wish to benefit from compute-intensive business. Cloud computing also offers an opportunity to many third-world countries that wouldn't have such computing resources otherwise. Cloud computing can lower IT barriers to innovation.<ref>{{harvnb|Marston|Lia|Bandyopadhyaya|Zhanga|2011|p=3}}</ref>
More and more companies have been utilizing the cloud computing to manage the massive amount of data and overcome the lack of storage capacities.
Indeed, the companies are enabled to use resources as a service to assure their computing needs without having to invest on infrastructure, so they pay for what they are going to use (Pay-as-you-go model).<ref>{{harvnb|Alireza Angabini, Nasser Yazdani, Thomas Mundt, Fatemeh Hassani|p=1|id=Angabini}}</ref>
 
Despite the wide utilization of cloud computing, efficient sharing of large volumes of data in an untrusted cloud is still a challenge.
Every application provider has to periodically pay the cost of each server where replicas of his data are stored. The cost of a server is generally estimated by the quality of the hardware, the storage capacities, and its query processing and communication overhead.<ref>{{harvnb|Nicolas Bonvin, Thanasis G. Papaioannou, Karl Aberer|p=3|id=Bonvin}}</ref>
 
== References ==
Cloud computing facilitates the tasks for enterprises to scale their services under the client requests.
{{reflist|30em}}
The pay-as-you-go model has also facilitate the tasks for the startup companies that wish to benefit from compute-intensive business. Cloud computing also offers a huge opportunity to many third-world countries that don't have enough resources, and thus enabling IT services.
Cloud computing can lower IT barriers to innovation.<ref>{{harvnb|Sean Marston, Zhi Li, Subhajyoti Bandyopadhyay, Juheng Zhang, Anand Ghalsas|p=3|id= Kaufman}}</ref>
 
Although the wide utilization of cloud computing, an efficient sharing of large volumes of data in an untrusted cloud is still a challenging research topic.
 
==References==
{{Reflist|4}}
 
==Bibliography==
 
== Bibliography ==
* {{cite book
| last1 = Andrew
| first1 = S.Tanenbaum
| last2 = Maarten
| first2 = vanVan Steen
| year = 2006
| title = Distributed file systems principles and paradigms
| title = Distributed systems principles and paradigms
| id = Tanenbaum
| url = http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf
 
| access-date = 2014-01-10
| archive-date = 2013-08-20
| archive-url = https://web.archive.org/web/20130820190519/http://net.pku.edu.cn/~course/cs501/2011/resource/2006-Book-distributed%20systems%20principles%20and%20paradigms%202nd%20edition.pdf
| url-status = dead
}}
* {{cite web
| first = Fabio |last = Kon
| title = Distributed File Systems Past, Present and Future: A Distributed File System for 2006
| url = https://www.researchgate.net/publication/2439179
| year = 1996
| website = [[ResearchGate]]
}}
* {{cite web
| author = Pavel Bžoch
| url = http://www.kiv.zcu.cz/site/documents/verejne/vyzkum/publikace/technicke-zpravy/2012/tr-2012-02.pdf
| title = Distributed File Systems Past, Present and Future A Distributed File System for 2006 (1996)
}}
* {{cite web
| idauthor = FabioSun microsystem
| url = http://www.cse.chalmers.se/~tsigas/Courses/DCDSeminar/Files/afs_report.pdf
| author = Fabio Kon
| title = Distributed file systems – an overview
| url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.4609
| title = Distributed File Systems Past, Present and Future A Distributed File System for 2006 (1996)
| year = 1996
| website = http://www.citeulike.org/group/3944/article/3390802
}}
* {{Citecite web
| idlast1 = sunJacobi
| authorfirst1 = S.microsystemTim-Daniel
| last2 = Lingemann
| url = http://www.cse.chalmers.se/~tsigas/Courses/DCDSeminar/Files/afs_report.pdf
| first2 = Jan
| title = Distributed file systems – an overview
| url = http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf
| website = http://www.cse.chalmers.se/~tsigas/Courses/DCDSeminar/Files/afs_report.pdf
| title = Evaluation of Distributed File Systems
}}
| access-date = 2014-01-24
#Architecture & Structure & design:
| archive-date = 2014-02-03
#* {{cite journal
| archive-url = https://web.archive.org/web/20140203140412/http://wr.informatik.uni-hamburg.de/_media/research/labs/2012/2012-10-tim-daniel_jacobi_jan_lingemann-evaluation_of_distributed_file_systems-report.pdf
| id = Zhang
| last1 url-status = Zhang dead
}}
| first1 = Qi-fei
# Architecture, structure, and design:
| last2 = Pan
#* {{cite book
| first2 = Xue-zeng
| last3 last1 = Shen Zhang
| first3 first1 = YanQi-fei
| title = 2012 IEEE International Conference on Cluster Computing Workshops
| last4 = Li
| pages = 41
| first4 = Wen-juan
| last2 = Pan
| title = A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P
| first2 = Xue-zeng
| periodical = Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on
| last3 = Shen
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6354581
| year first3 = 2012Yan
| last4 = Li
| url = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6355845
| first4 = Wen-juan
| doi = 10.1109/ClusterW.2012.27
| year = 2012
| others = Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
| doi = 10.1109/ClusterW.2012.27
}}
| s2cid = 12430485
#* {{cite journal
| chapter = A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P
| id = Azzedin
| isbn = 978-0-7695-4844-9
| last1 = Farag
}}
| first1 = Azzedin
#* {{cite book
| title = Towards A Scalable HDFS Architecture
| last1 = Azzedin
| periodical = Collaboration Technologies and Systems (CTS), 2013 International Conference on
| first1 =Farag
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6558543
| title = 2013 International Conference on Collaboration Technologies and Systems (CTS)
| year = 2013
| year = 2013
| url = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6567222
| doi = 10.1109/CTS.2013.6567222
| s2cid = 45293053
| others = Information and Computer Science Department King Fahd University of Petroleum and Minerals
| pages = 155–161
| chapter = Towards a scalable HDFS architecture
}}
| isbn = 978-1-4673-6404-1
}}
#* {{Cite web
| idlast1 = Krzyzanowski
| last1 first1 = Paul
| title = Distributed File Systems
| first1 = Krzyzanowski
| year = 2012
| title = Distributed File Systems
| url = http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf
| year = 2012
| access-date = 2013-12-27
| url = http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf
| archive-date = 2013-12-27
 
| archive-url = https://web.archive.org/web/20131227152320/http://www.cs.rutgers.edu/~pxk/417/notes/16-dfs.pdf
| url-status = dead
}}
#* {{cite conference
| last1 = Kobayashi | first1 = K
| last2 = Mikami| first2 = S
| last3 = Kimura| first3 = H
| last4 = Tatebe| first4 = O
| year = 2011
| title = The Gfarm File System on Compute Clouds
| conference = Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
| conference-url = https://ieeexplore.ieee.org/xpl/conhome/6008655/proceeding
| doi = 10.1109/IPDPS.2011.255
}}
#* {{cite book
| last1 = Humbetov
| first1 = Shamil
| title = 2012 6th International Conference on Application of Information and Communication Technologies (AICT)
| year = 2012
| doi = 10.1109/ICAICT.2012.6398489
| s2cid = 6113112
| pages = 1–5
| chapter = Data-intensive computing with map-reduce and hadoop
| isbn = 978-1-4673-1740-5
}}
#* {{cite journal
| idlast1 = KobayashiHsiao
| first1 =Hung-Chang
| last1 = K.
| last2 = Chung
| first1 = Kobayashi
| first2 =Hsueh-Yi
| last2 = S.
| first2 last3 = MikamiShen
| first3 =Haiying
| last3 = H.
| first3 last4 = KimuraChao
| first4 =Yu-Chang
| last4 = O.
| title = Load Rebalancing for Distributed File Systems in Clouds
| first4 = Tatebe
| journal = IEEE Transactions on Parallel and Distributed Systems
| title = The Gfarm File System on Compute Clouds
| year = 2013
| periodical = Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
| doi = 10.1109/TPDS.2012.196
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6008655
| year s2cid = 201111271386
| pages = 951–962
| url =http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6008891&url=http%3A%2%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6008891
| volume=24
| doi = 10.1109/IPDPS.2011.255
| issue = 5
| others = Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan
}}
#* {{cite journalbook
| idlast1 = HumbetovKai
| last1 first1 = Shamil Fan
| title = 2013 5th International Conference on Intelligent Networking and Collaborative Systems
| first1 = Humbetov
| last2 = Dayang
| title = Data-Intensive Computing with Map-Reduce and Hadoop
| first2 = Zhang
| periodical = Application of Information and Communication Technologies (AICT), 2012 6th International Conference on
| last3 = Hui
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6385344
| year first3 = 2012Li
| last4 = Yintang
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6398489
| first4 = Yang
| doi = 10.1109/ICAICT.2012.6398489
| year = 2013
| others = Department of Computer Engineering Qafqaz University Baku, Azerbaijan
| doi = 10.1109/INCoS.2013.14
| pages = 1–5
| s2cid = 14821266
}}
| pages = 23–29
#* {{cite journal
| chapter = An Adaptive Feedback Load Balancing Algorithm in HDFS
| id = Hsiao
| isbn = 978-0-7695-4988-0
| last1 = Hung-Chang
}}
| first1 = Hsiao
#* {{cite book
| last2 = Hsueh-Yi
| last1 = Upadhyaya
| first2 = Chung
| last3 first1 = Haiying B
| title = 2008 Fourth International Conference on Networked Computing and Advanced Information Management
| first3 =Shen
| last2 = Azimov
| last4 = Yu-Chang
| first4 first2 = Chao F
| last3 = Doan
| title = Load Rebalancing for Distributed File Systems in Clouds
| first3 = T.T
| periodical = Parallel and Distributed Systems, IEEE Transactions on (Volume:24 , Issue: 5 )
| last4 = Choi
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/RecentIssue.jsp?punumber=71
| year first4 = 2013Eunmi
| last5 = Kim
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6226382
| first5 = Sangbum
| doi = 10.1109/TPDS.2012.196
| last6 = Kim
| others = National Cheng Kung University, Tainan
| pagesfirst6 = 951–962 Pilsung
| year = 2008
}}
| doi = 10.1109/NCM.2008.164
#* {{cite journal
| ids2cid = Fan 18933772
| last1 = Kai
| first1 = Fan
| last2 = Dayang
| first2 = Zhang
| last3 = Hui
| first3 = Li
| last4 = Yintang
| first4 = Yang
| title = An Adaptive Feedback Load Balancing Algorithm in HDFS
| periodical = Intelligent Networking and Collaborative Systems (INCoS), 2013 5th International Conference on
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6630246
| year = 2013
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6630284
| doi = 10.1109/INCoS.2013.14
| others = State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an, China
| pages = 23–29
}}
#* {{cite journal
| id = Upadhyaya
| last1 = B.
| first1 = Upadhyaya
| last2 = F.
| first2 = Azimov
| last3 = T.T.
| first3 = Doan
| last4 = Choi
| first4 = Eunmi
|last5 = Kim
|first5 = SangBum
|last6 = Kim
|first6 = Pilsung
| title = Distributed File System: Efficiency Experiments for Data Access and Communication
| periodical = Networked Computing and Advanced Information Management, 2008. NCM '08. Fourth International Conference on (Volume:2 )
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=4623957
| year = 2008
| url = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=4624176
| doi = 10.1109/NCM.2008.164
| others = Sch. of Bus. IT, Kookmin Univ., Seoul
| pages = 400–405
| chapter = Distributed File System: Efficiency Experiments for Data Access and Communication
}}
| isbn = 978-0-7695-3322-3
#* {{cite journal
}}
| id = Adamov
#* {{cite book
| last1 = Abzetdin
| first1 last1 = AdamovSoares
| first1 = Tiago S.
| title = Distributed File System as a basis of Data-Intensive Computing
| title = 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises
| periodical = Application of Information and Communication Technologies (AICT), 2012 6th International Conference on
| last2 = Dantas†
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6385344
| year first2 = 2012M.A.R
| last3 = de Macedo
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6398484
| first3 = Douglas D.J.
| doi = 10.1109/ICAICT.2012.6398484
| last4 = Bauer
| others = Comput. Eng. Dept., Qafqaz Univ., Baku, Azerbaijan
| first4 = Michael A
| year = 2013
| doi = 10.1109/WETICE.2013.12
| s2cid = 6155753
| pages = 158–163
| chapter = A Data Management in a Private Cloud Storage Environment Utilizing High Performance Distributed File Systems
| isbn = 978-1-4799-0405-1
}}
#* {{cite book
| last1 = Adamov
| first1 = Abzetdin
| title = 2012 6th International Conference on Application of Information and Communication Technologies (AICT)
| year = 2012
| doi = 10.1109/ICAICT.2012.6398484
| s2cid = 16674289
| pages = 1–3
| chapter = Distributed file system as a basis of data-intensive computing
}}
| isbn = 978-1-4673-1740-5
}}
#* {{cite journal
| author = Schwan Philip
| id =Brandt
| title = Lustre: Building a File System for 1,000-node Clusters
| last1 = S.A.
| periodical = Proceedings of the 2003 Linux Symposium
| first1 = Brandt
| last2 year = E.L. 2003
| url = https://www.kernel.org/doc/ols/2003/ols2003-pages-380-386.pdf
| first2 = Miller
| pages = 400–407
| last3 = D.D.E.
| first3 = Long
| last4 = Lan
| first4 = Xue
| title = Efficient metadata management in large distributed storage systems
| periodical = Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. 20th IEEE/11th NASA Goddard Conference on
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8502
| year = 2003
| url =http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1194865&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1194865
| doi = 10.1109/MASS.2003.1194865
| others = Storage Syst. Res. Center, California Univ., Santa Cruz, CA, USA
| pages = 290–298
}}
#* {{cite journal
|last1 = Jones
| id =Brandt
| last1 first1 = Garth A. Terry
| first1 last2 = Gibson Koniges
| last2 first2 = Rodney Alice
| first2 last3 = MVan MeterYates
|first3 = R. Kim
| title = Network attached storage architecture
|title = Performance of the IBM General Parallel File System
| periodical = COMMUNICATIONS OF THE ACM
|periodical = Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International
| volume = 43
|url = https://computing.llnl.gov/code/sio/GPFS_performance.pdf
| number = 11
| year = November 2000
|access-date = 2014-01-24
| url =http://www.cs.cmu.edu/~garth/CACM/CACM00-p37-gibson.pdf
|archive-date = 2013-02-26
|archive-url = https://web.archive.org/web/20130226053255/https://computing.llnl.gov/code/sio/GPFS_performance.pdf
|url-status = dead
}}
#* {{cite journalconference
|last1 id =Khaing Weil
| last1 first1 = Cho Cho Sage A.
| first1 last2 = Khaing Brandt
| last2 first2 = Thinn ThuScott A.
| first2 last3 = NaingMiller
|first3 = Ethan L.
| title = The efficient data storage management system on cluster-based private cloud data center
|last4 = Long
| periodical = Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
|first4 = Darrell D. E.
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6034549
|title = Ceph: A Scalable, High-Performance Distributed File System
| year = 2011
|year = 2006
| url =http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6045066&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6045066
| doi url = 10.1109http:/CCIS/www.2011ssrc.ucsc.edu/Papers/weil-osdi06.6045066pdf
|conference = Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06)
| pages = 235–239
|access-date = 2014-01-24
|archive-date = 2012-03-09
|archive-url = https://web.archive.org/web/20120309021423/http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf
|url-status = dead
}}
#* {{cite journalreport
| idlast1 =Brandt Maltzahn
| last1 first1 = S.A. Carlos
| last2 = Molina-Estolano
| first1 = Brandt
| first2=Esteban
| last2 = E.L.
| last3 = Khurana
| first2 = Miller
| first3=Amandeep
| last3 = D.D.E.
| last4 =Nelson
| first3 = Long
| last4 first4= LanAlex J.
| first4 last5 = XueBrandt
| first5= Scott A.
| title = A carrier-grade service-oriented file storage architecture for cloud computing
| last6=Weil
| periodical = Web Society (SWS), 2011 3rd Symposium on
| first6=Sage
| layurl =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6093898
| title =Ceph as a scalable alternative to the Hadoop Distributed FileSystem
| year = 2011
| year = 2010
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6101263
| url =https://www.usenix.org/legacy/publications/login/2010-08/openpdfs/maltzahn.pdf
| doi = 10.1109/SWS.2011.6101263
| others = PCN&CAD Center, Beijing Univ. of Posts & Telecommun., Beijing, China
| pages = 16–20
}}
#* {{cite book
| last1 = S.A.
| first1 = Brandt
| title = 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings
| last2 = E.L.
| first2 = Miller
| last3 = D.D.E.
| first3 = Long
| last4 = Lan
| first4 = Xue
| year = 2003
| doi = 10.1109/MASS.2003.1194865
| pages = 290–298
| chapter = Efficient metadata management in large distributed storage systems
| isbn = 978-0-7695-1914-2
| citeseerx = 10.1.1.13.2537
| s2cid = 5548463
}}
#* {{cite journal
| idlast1 = GhemawatGarth A.
| last1 first1 = Sanjay Gibson
| last2 = Rodney
| first1 = Ghemawat
| last2 first2 = Howard MVan Meter
| title = Network attached storage architecture
| first2 = Gobioff
| periodical = Communications of the ACM
| last3 = Shun-Tak
| first3 volume = Leung43
| pages = 37–45
| title = The Google File System
| number = 11
| periodical = SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
| date = November 2000
| layurl =http://www.acm.org/publications
| url =https://www.cs.cmu.edu/~garth/CACM/CACM00-p37-gibson.pdf
| url =http://delivery.acm.org/10.1145/950000/945450/p29-ghemawat.pdf?ip=193.49.225.10&id=945450&acc=ACTIVE%20SERVICE&key=C2716FEBFA981EF1971CC4FEAA1B2E6AEC2E9AF98B5715B6&CFID=389152643&CFTOKEN=76651374&__acm__=1387181198_6f80731d1307d082b6f40adc631252d5
| doi=10.1145/353360.353362
| s2cid = 207644891
}}
#* {{cite arXiv
| last1 = Yee
| first1 = Tin Tin
| last2 = Thu Naing
| first2 = Thinn
| title = PC-Cluster based Storage System Architecture for Cloud Storage
| year = 2011
| eprint=1112.2025
| class = cs.DC
}}
#* {{cite book
| last1 = Cho Cho
| first1 = Khaing
| title = 2011 IEEE International Conference on Cloud Computing and Intelligence Systems
| last2 = Thinn Thu
| first2 = Naing
| s2cid = 224635
| year = 2011
| doi = 10.1109/CCIS.2011.6045066
| pages = 235–239
| chapter = The efficient data storage management system on cluster-based private cloud data center
| isbn = 978-1-61284-203-5
}}
#* {{cite book
| last1 = S.A.
| first1 = Brandt
| title = 2011 3rd Symposium on Web Society
| last2 = E.L.
| first2 = Miller
| last3 = D.D.E.
| first3 = Long
| last4 = Lan
| first4 = Xue
| year = 2011
| doi = 10.1109/SWS.2011.6101263
| s2cid = 14791637
| pages = 16–20
| chapter = A carrier-grade service-oriented file storage architecture for cloud computing
| isbn = 978-1-4577-0211-2
}}
#* {{cite book
| last1 = Ghemawat
| first1 =Sanjay
| title = Proceedings of the nineteenth ACM symposium on Operating systems principles – SOSP '03
| last2 = Gobioff
| first2 =Howard
| last3 = Leung
| first3 =Shun-Tak
| year = 2003
| doi = 10.1145/945445.945450
| pages = 29–43
| chapter = The Google file system
}}
| isbn = 978-1-58113-757-6
#Security Concept
| s2cid =221261373
}}
# Security
#* {{cite book
| last1 = Vecchiola
| first1 = C
| title = 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
| last2 = Pandey
| first2 = S
| last3 = Buyya
| first3 = R
| year = 2009
| doi = 10.1109/I-SPAN.2009.150
| pages = 4–16
| chapter = High-Performance Cloud Computing: A View of Scientific Applications
| isbn = 978-1-4244-5403-7
| arxiv = 0910.1979
| s2cid = 1810240
}}
#* {{cite book
| last1 = Miranda
| first1 = Mowbray
| title = Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE – COMSWARE '09
| pages = 1
| last2 = Siani
| first2 = Pearson
| s2cid = 10130310
| year = 2009
| doi = 10.1145/1621890.1621897
| chapter = A client-based privacy manager for cloud computing
| isbn = 978-1-60558-353-2
}}
#* {{cite book
| last1 = Naehrig
| first1 = Michael
| title = Proceedings of the 3rd ACM workshop on Cloud computing security workshop – CCSW '11
| last2 = Lauter
| first2 = Kristin
| year = 2013
| doi = 10.1145/2046660.2046682
| pages = 113–124
| chapter = Can homomorphic encryption be practical?
| isbn = 978-1-4503-1004-8
| citeseerx = 10.1.1.225.8007
| s2cid = 12274859
}}
#* {{cite book
| last1 = Du
| first1 = Hongtao
| last2 = Li
| first2 = Zhanhuai
| title = 2012 International Conference on Measurement, Information and Control (MIC)
|volume = 1
| year = 2012
| doi = 10.1109/MIC.2012.6273264
| s2cid = 40685246
| pages = 327–331
| chapter = PsFS: A high-throughput parallel file system for secure Cloud Storage system
| isbn = 978-1-4577-1604-1
}}
#* {{cite journal
| idlast1 = Vecchiola A.Brandt
| last1 first1 = C Scott
| last2 = L.Miller
| first1 = Vecchiola
| last2 first2 = S. Ethan
| last3 = D.E.Long
| first2 = Pandey
| last3 first3 = R.Darrell
| first3 last4 = BuyyaXue
| first4 = Lan
| title = High-Performance Cloud Computing: A View of Scientific Applications
| title = Efficient Metadata Management in Large Distributed Storage Systems
| periodical =Pervasive Systems, Algorithms, and Networks (ISPAN), 2009 10th International Symposium on
| periodical = 11th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=5379703
| year = 20092003
| url = http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=5381983
| access-date = 2013-12-27
| doi = 10.1109/I-SPAN.2009.150
| archive-date = 2013-08-22
| others = Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia
| archive-url = https://web.archive.org/web/20130822213717/http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf
| pages = 4–16
| url-status = dead
}}
}}
#* {{cite journal
| idauthor = Hongtao Lori M. Kaufman
| last1 s2cid = Du 16233643
| title =Data Security in the World of Cloud Computing
| first1 = Hongtao
| last2journal = IEEE Security & = Li Privacy
| year = 2009
| first2 = Zhanhuai
| doi = 10.1109/MSP.2009.87
| title = Efficient metadata management in large distributed storage systems
| pages = 161–64
| periodical = Measurement, Information and Control (MIC), 2012 International Conference on
| volume = 17
| issue = 4
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6261643
}}
| year = 2012
#* {{cite book
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6273264
| last1 = Bowers
| doi = 10.1109/MIC.2012.6273264
| first1 = Kevin
| others = Comput. Coll., Northwestern Polytech. Univ., Xi''An, China
| pageslast2 = 327–331 Juels
| first2 = Ari
}}
| last3 = Oprea
| first3 =Alina
| title = Proceedings of the 16th ACM conference on Computer and communications security
| chapter = HAIL: A high-availability and integrity layer for cloud storage
| s2cid = 207176701
| year = 2009
| doi = 10.1145/1653662.1653686
| pages = 187–198
| isbn = 978-1-60558-894-0
}}
#* {{cite journal
| idlast1 =Scott Juels
| last1 first1 = A.Brandt Ari
| last2 = Oprea
| first1 = Scott
| first2 =Alina
| last2 = L.Miller
| s2cid = 17596621
| first2 = Ethan
| title = New approaches to security and availability for cloud data
| last3 = D.E.Long
| doi = 10.1145/2408776.2408793
| first3 = Darrell
| last4 pages = Xue64–73
| journal=Communications of the ACM | volume = 56 |number= 2 |date=February 2013
| first4 = Lan
| title =Efficient Metadata Management in Large Distributed Storage Systems
| periodical = 11th NASA Goddard Conference on Mass Storage Systems and Technologies,SanDiego,CA
| year = 2003
| url =http://www.ssrc.ucsc.edu/Papers/brandt-mss03.pdf
| others = Storage Systems Research Center University of California,Santa Cruz
}}
#* {{cite book
| last1 = Zhang
| first1 = Jing
| title = 2012 ACM/IEEE 13th International Conference on Grid Computing
| last2 = Wu
| first2 = Gongqing
| last3 = Hu
| first3 = Xuegang
| last4 = Wu
| first4 = Xindong
| year = 2012
| doi = 10.1109/Grid.2012.17
| s2cid = 10778240
| pages = 12–21
| chapter = A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
| isbn = 978-1-4673-2901-9
}}
#* {{cite book
| last1 = A.
| first1 = Pan
| title = 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
| last2 = J.P.
| first2 = Walters
| last3 = V.S.
| first3 = Pai
| last4 = D.-I.D.
| first4 = Kang
| last5 = S.P.
| first5 = Crago
| year = 2012
| doi = 10.1109/SC.Companion.2012.103
| s2cid = 5554936
| pages = 753–759
| chapter = Integrating High Performance File Systems in a Cloud Computing Environment
| isbn = 978-0-7695-4956-9
}}
#* {{cite book
| last1 = Fan-Hsun
| first1 = Tseng
| title = 2012 International Symposium on Intelligent Signal Processing and Communications Systems
| last2 = Chi-Yuan
| first2 =Chen
| last3 = Li-Der
| first3 = Chou
| last4 = Han-Chieh
| first4 =Chao
| year = 2012
| doi = 10.1109/ISPACS.2012.6473485
| s2cid = 18260943
| pages = 227–232
| chapter = Implement a reliable and secure cloud distributed file system
| isbn = 978-1-4673-5082-2
}}
#* {{cite book
| last1 = Di Sano
| first1 = M
| title = 2012 IEEE 21st International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises
| last2 = Di Stefano
| first2 = A
| last3 = Morana
| first3 = G
| last4 = Zito
| first4 = D
| year = 2012
| doi = 10.1109/WETICE.2012.104
| s2cid = 19798809
| pages = 173–178
| chapter = File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds
| isbn = 978-1-4673-1888-4
}}
#* {{cite journal
| idlast1 = KaufmanZhifeng
| last1 first1 = Lori M. Xiao
| last2 = Yang
| first1 = Kaufman
| first2 = Xiao
| title =Data Security in the World of Cloud Computing
| s2cid = 206583820
| periodical = Security & Privacy, IEEE (Volume:7 , Issue: 4 )
| title = Security and Privacy in Cloud Computing
| layurl = http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8013
| periodical = IEEE Communications Surveys and Tutorials
| year = 2009
| year = 2013
| url =http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5189563
| doi = 10.1109/MSPSURV.2012.2009060912.8700182
| pages = 161–64 843–859
| volume=15
}}
| issue = 2
#* {{cite journal
| citeseerx = 10.1.1.707.3980
| id = HAIL
}}
| last1 = Kevin
| first1 = D. Bowers
| last2 = Ari
| first2 = Juels
| last3 = Alina
| first3 = Oprea
| title = HAIL: a high-availability and integrity layer for cloud storageComputing
| periodical = Proceedings of the 16th ACM conference on Computer and communications security
| layurl = http://www.sigsac.org/ccs/CCS2009/
| year = 2009
| url =http://dl.acm.org/ft_gateway.cfm?id=1653686&ftid=707973&dwn=1&CFID=382853364&CFTOKEN=27119971
| doi = 10.1145/1653662.1653686
| pages = 187–198
}}
#* {{cite journal
| id = Ari Juels
| last1 = Ari
| first1 = Juels
| last2 = Alina
| first2 = Oprea
| title = New approaches to security and availability for cloud data
| periodical = Magazine Communications of the ACM CACM Homepage archive Volume 56 Issue 2, February 2013
| layurl = http://cacm.acm.org/
| year = 2013
| url =http://dl.acm.org/ft_gateway.cfm?id=2408793&ftid=1338744&dwn=1&CFID=382853364&CFTOKEN=27119971
| doi = 10.1145/2408776.2408793
| pages = 64–73
}}
#* {{cite journal
| id = Jing
| last1 = Zhang
| first1 = Jing
| last2 = Wu
| first2 = Gongqing
| last3 = Hu
| first3 = Xuegang
| last4 = Wu
| first4 = Xindong
| title = A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
| periodical = Grid Computing (GRID), 2012 ACM/IEEE 13th International Conference on
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6317268
| year = 2012
| url =http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?arnumber=6319150%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1194865
| doi = 10.1109/Grid.2012.17
| others = Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China
| pages = 12–21
}}
#* {{cite journal
| id = Pan
| last1 = A.
| first1 = Pan
| last2 = J.P.
| first2 = Walters
| last3 = V.S.
| first3 = Pai
| last4 = D.-I.D.
| first4 = Kang
| last5 = S.P.
| first5 = Crago
| title =Integrating High Performance File Systems in a Cloud Computing Environment
| periodical = High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6494369
| year = 2012
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6495885
| doi = 10.1109/SC.Companion.2012.103
| others = Dept. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
| pages = 753–759
}}
#* {{cite journal
| id = Fan-Hsun
| last1 = Tseng
| first1 = Fan-Hsun
| last2 = Chen
| first2 = Chi-Yuan
| last3 = Chou
| first3 = Li-Der
| last4 = Chao
| first4 = Han-Chieh
| title =Implement a reliable and secure cloud distributed file system
| periodical = Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6470430
| year = 2012
| url =http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6473485&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel7%2F6470430%2F6473441%2F06473485.pdf%3Farnumber%3D6473485
| doi = 10.1109/ISPACS.2012.6473485
| others = Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Taoyuan, Taiwan
| pages = 227–232
}}
#* {{cite journal
| id = Di Sano
| last1 = M
| first1 = Di Sano
| last2 = A.
| first2 = Di Stefano
| last3 = G.
| first3 = Morana
| last4 = D.
| first4 = Zito
| title =File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds
| periodical = Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2012 IEEE 21st International Workshop on
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6269211
| year = 2012
| url =http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6269722
| doi = 10.1109/WETICE.2012.104
| others = Dept. of Electr., Electron. & Comput. Eng., Univ. of Catania, Catania, Italy
| pages = 173–178
}}
#* {{cite journal
| id = Zhonghua
| last1 = Sheng
| first1 = Zhonghua
| last2 = Ma
| first2 = Zhiqiang
| last3 = Gu
| first3 = Lin
| last4 = Li
| first4 = Ang
| title =A privacy-protecting file system on public cloud storage
| periodical = Cloud and Service Computing (CSC), 2011 International Conference on
| layurl = http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6132470
| year = 2011
| url =http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6138512&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6138512
| doi = 10.1109/CSC.2011.6138512
| others = Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
| pages = 141–149
}}
#* {{cite journal
| id = Zhifeng
| last1 = Zhifeng
| first1 = Xiao
| title = Security and Privacy in Cloud Computing
| periodical = Communications Surveys & Tutorials, IEEE (Volume:15 , Issue: 2 )
| layurl = https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9739
| year = 2013
| url =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6238281
| doi = 10.1109/SURV.2012.060912.00182
| pages = 843–859
}}
#* {{Cite web
| idlast1 = HorriganJohn B
| last1 first1 = John B Horrigan
| title = Use of cloud computing applications and services
| first1 = Horrigan
| year = 2008
| title = Use of cloud computing applications and services
| url = http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf
| year = 2008
| access-date = 2013-12-27
| url = http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf
| archive-date = 2013-07-12
 
| archive-url = https://web.archive.org/web/20130712182757/http://www.pewinternet.org/~/media//Files/Reports/2008/PIP_Cloud.Memo.pdf.pdf
}}
| url-status = dead
}}
#* {{cite journal
| idlast1 = StephenYau
| last1 first1 = Stephen
| last2 = An
| first1 = S. Yau
| last2 first2 = Ho
| title = Confidentiality Protection in cloud computing systems
| first2 = G. An
| periodical = Int J Software Informatics
| title = Confidentiality Protection in cloud computing systems
| year = 2010
| periodical = Int J Software Informatics, Vol.4, No.4,
| url = http://www.ijsi.org/ch/reader/create_pdf.aspx?file_no=i68&flag=&journal_id=ijsi&year_id=2010
| year = 2010
| pages = 351–365
| url = http://www.ijsi.org/ch/reader/create_pdf.aspx?file_no=i68&flag=&journal_id=ijsi&year_id=2010
| pages = 351–365
}}
#* {{cite book
| last1 = Carnegie
| first1 = Bin Fan
| last2 = Tantisiriroj
| first2 = Wittawat
| last3 = Xiao
| first3 = Lin
| last4 = Gibson
| first4 = Garth
| title = Proceedings of the 4th Annual Workshop on Petascale Data Storage
| chapter = DiskReduce: RAID for data-intensive scalable computing
| s2cid = 15194567
| year = 2009
| doi = 10.1145/1713072.1713075
| pages = 6–10
| isbn = 978-1-60558-883-4
}}
#* {{cite book
| last1 = Wang
| first1 = Jianzong
| title = 2012 ACM/IEEE 13th International Conference on Grid Computing
| last2 = Gong
| first2 = Weijiao
| last3 = P.
| first3 = Varman
| last4 = Xie
| first4 = Changsheng
| s2cid = 16827141
| year = 2012
| doi = 10.1109/Grid.2012.29
| pages = 174–183
| chapter = Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System
| isbn = 978-1-4673-2901-9
}}
#* {{cite book
| last1 = Abu-Libdeh
| first1 = Hussam
| last2 = Princehouse
| first2 = Lonnie
| last3 = Weatherspoon
| first3 = Hakim
| title = Proceedings of the 1st ACM symposium on Cloud computing
| chapter = RACS: A case for cloud storage diversity
| s2cid = 1283873
| year = 2010
| doi = 10.1145/1807128.1807165
| pages = 229–240
| isbn = 978-1-4503-0036-0
}}
#* {{cite journal
| idlast1 = Plantard Vogels
| last1 first1 = T. Werner
| title = Eventually consistent
| first1 = Plantard
| last2 year = W. 2009
| doi = 10.1145/1435417.1435432
| first2 = Susilo
| last3 pages = Z. 40–44
| journal=Communications of the ACM| volume = 52 |number= 1
| first3 = Zhang
| doi-access = free
| title = Fully Homomorphic Encryption Using Hidden Ideal Lattice
}}
| periodical = Information Forensics and Security, IEEE Transactions on (Volume:8 , Issue: 12 )
#* {{cite book
| layurl = http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=10206
| year last1 = 2013Cuong
| first1 = Pham
| url = http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6650119
| title = IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012)
| doi = 10.1109/TIFS.2013.2287732
| pageslast2 = 2127–2137 Cao
| first2 = Phuong
}}
| last3 = Kalbarczyk
| first3 = Z
| last4 = Iyer
| first4 =R.K
| s2cid = 9920903
| year = 2012
| doi = 10.1109/DSNW.2012.6264687
| pages = 1–6
| chapter = Toward a high availability cloud: Techniques and challenges
| isbn = 978-1-4673-2266-9
}}
#* {{cite book
| last1 = A.
| first1 = Undheim
| title = 2011 IEEE/ACM 12th International Conference on Grid Computing
| last2 = A.
| first2 = Chilwan
| last3 = P.
| first3 = Heegaard
| s2cid = 15047580
| year = 2011
| doi = 10.1109/Grid.2011.25
| pages = 129–136
| chapter = Differentiated Availability in Cloud Computing SLAs
| isbn = 978-1-4577-1904-2
}}
#* {{cite book
| last1 = Qian
| first1 = Haiyang
| last2 = D.
| first2 = Medhi
| last3 = T.
| first3 = Trivedi
| title = 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops
| chapter = A hierarchical model to evaluate quality of experience of online services hosted by cloud computing
| year = 2011
| doi = 10.1109/INM.2011.5990680
| pages = 105–112
| volume = |number=
| isbn = 978-1-4244-9219-0
| citeseerx = 10.1.1.190.5148
| s2cid = 15912111
}}
#* {{cite book
| last1 = Ateniese
| first1 = Giuseppe
| title = Proceedings of the 14th ACM conference on Computer and communications security – CCS '07
| last2 = Burns
| first2 = Randal
| last3 = Curtmola
| first3 = Reza
| last4 = Herring
| first4 = Joseph
| last5 = Kissner
| first5 = Lea
| last6 = Peterson
| first6 = Zachary
| last7 = Song
| first7 = Dawn
| s2cid = 8010083
| year = 2007
| doi = 10.1145/1315245.1315318
| pages = 598–609
| chapter = Provable data possession at untrusted stores
| isbn = 978-1-59593-703-2
| url = https://figshare.com/articles/journal_contribution/6469184
}}
#* {{cite book
| last1 = Ateniese
| first1 = Giuseppe
| title = Proceedings of the 4th international conference on Security and privacy in communication networks – Secure ''Comm'' '08
| pages = 1
| last2 = Di Pietro
| first2 = Roberto
| last3 = V. Mancini
| first3 = Luigi
| last4 = Tsudik
| first4 = Gene
| year = 2008
| doi = 10.1145/1460877.1460889
| chapter = Scalable and efficient provable data possession
| isbn = 978-1-60558-241-2
| citeseerx = 10.1.1.208.8270
| s2cid = 207170639
}}
#* {{cite book
| last1 = Erway
| first1 = Chris
| title = Proceedings of the 16th ACM conference on Computer and communications security – CCS '09
| last2 = Küpçü
| first2 = Alptekin
| last3 = Tamassia
| first3 = Roberto
| last4 = Papamanthou
| first4 = Charalampos
| s2cid = 52856440
| year = 2009
| doi = 10.1145/1653662.1653688
| pages = 213–222
| chapter = Dynamic provable data possession
| isbn = 978-1-60558-894-0
}}
#* {{cite book
| last1 = Juels
| first1 = Ari
| last2 = S. Kaliski
| first2 = Burton
| title = Proceedings of the 14th ACM conference on Computer and communications security
| chapter = Pors: Proofs of retrievability for large files
| s2cid = 6032317
| year = 2007
| doi = 10.1145/1315245.1315317
| pages = 584–597
| isbn = 978-1-59593-703-2
}}
#* {{cite book
| last1 = Bonvin
| first1 =Nicolas
| title = Proceedings of the 1st ACM symposium on Cloud computing – SoCC '10
| last2 = Papaioannou
| first2 =Thanasis
| last3 = Aberer
| first3 = Karl
| s2cid = 3261817
| year = 2009
| doi = 10.1145/1807128.1807162
| pages = 205–216
| chapter = A self-organized, fault-tolerant and scalable replication scheme for cloud storage
| isbn = 978-1-4503-0036-0
| url =http://infoscience.epfl.ch/record/146774
}}
#* {{cite journal
| idlast1 = Michael Tim
| first1 = Michael Kraska
| last1 last2 = Naehrig Martin
| first2 = Kristin Hentschel
| last2 last3 = Lauter Gustavo
| first3 = Alonso
| title = Can homomorphic encryption be practical?
| last4 = Donald
| periodical = CCSW '11 Proceedings of the 3rd ACM workshop on Cloud computing security workshop
| first4 = Kossma
| layurl = http://www.sigsac.org/ccs/CCS2011/
| title = Consistency rationing in the cloud: pay only when it matters
| year = 2013
| year = 2009
| url = http://dl.acm.org/ft_gateway.cfm?id=2046682&ftid=1047116&dwn=1&CFID=385102978&CFTOKEN=71213808
| pages = 253–264
| doi = 10.1145/2046660.2046682
| journal=Proceedings of the VLDB Endowment | volume = 2 |issue= 1|doi=10.14778/1687627.1687657
| pages = 113–124
}}
#* {{cite report
| last1 = Daniel
| first1 = J. Abadi
| title = Data Management in the Cloud: Limitations and Opportunities
| citeseerx=10.1.1.178.200
| year = 2009
}}
#* {{cite journal
| idlast1 = Miranda Ari
| first1 = Miranda Juels
| last1 last2 = Mowbray S.
| first2 = SianiBurton
| last2 last3 = Pearson Jr
| first3 = Kaliski
| title = A client-based privacy manager for cloud computing
| s2cid = 6032317
| periodical = COMSWARE '09 Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE
| title = Pors: proofs of retrievability for large files
| layurl = http://www.comsware.org/
| year = 20092007
| doi = 10.1145/1315245.1315317
| url = http://dl.acm.org/ft_gateway.cfm?id=1621897&ftid=672995&dwn=1&CFID=385102978&CFTOKEN=71213808
| pages = 584–597
| doi = 10.1145/1621890.1621897
| journal=Communications of the ACM| volume = 56|number= 2
}}
#* {{cite journalbook
| idlast1 = Vogels Ari
| last1 first1 = Vogels Ateniese
| last2 = Randal
| first1 = Werner
| first2 = Burns
| title = Eventually consistent
| last3 = Johns
| periodical = Communications of the ACM - Rural engineering development CACM Volume 52 Issue 1
| first3 = Reza
| layurl = http://cacm.acm.org/
| year last4 = 2009 Curtmola
| first4 = Joseph
| url = http://dl.acm.org/ft_gateway.cfm?id=1435432&ftid=574047&dwn=1&CFID=267174164&CFTOKEN=52170875
| last5 = Herring
| doi = 10.1145/1435417.1435432
| pagesfirst5 = 40–44 Burton
| last6 = Lea
}}
| first6 = Kissner
#* {{cite journal
| idlast7 = Bonvin Zachary
| last1 first7 = Nicolas Peterson
| last8 = Dawn
| first1 = Bonvin
| first8 = Song
| last2 = Thanasis
| s2cid = 8010083
| first2 = G. Papaioannou
| title = CCS '07 Proceedings of the 14th ACM conference on Computer and communications security
| last3 = Karl
| year = 2007
| first3 = Aberer
| doi = 10.1145/1315245.1315318
| title = A self-organized, fault-tolerant and scalable replication scheme for cloud storage
| pages = 598–609
| periodical = SoCC '10 Proceedings of the 1st ACM symposium on Cloud computing
| chapter = Provable data possession at untrusted stores
| layurl = http://www.sigmod.org/
| isbn = 978-1-59593-703-2
| year = 2009
| url = https://figshare.com/articles/journal_contribution/6469184
| url = http://dl.acm.org/ft_gateway.cfm?id=1807162&ftid=809875&dwn=1&CFID=385102978&CFTOKEN=71213808
}}
| doi = 10.1145/1807128.1807162
# Synchronization
| pages = 205–216
#* {{cite book
}}
| last1 = Uppoor
#* {{cite journal
| idfirst1 = KraskaS
| title = 2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)
| last1 = Tim
| last2 = Flouris
| first1 = Kraska
| last2 first2 = Martin M.D
| last3 = Bilas
| first2 = Hentschel
| last3 first3 = Gustavo A
| first3 year = Alonso2010
| doi = 10.1109/CLUSTERWKSP.2010.5613087
| last4 = Donald
| first4 = Kossma
| title = Consistency rationing in the cloud: pay only when it matters
| periodical = Proceedings of the VLDB Endowment VLDB Endowment Hompage archive Volume 2 Issue 1,
| layurl = http://www.eecs.umich.edu/db/pvldb/
| year = 2009
| url = http://dl.acm.org/ft_gateway.cfm?id=1687657&type=pdf&CFID=385102978&CFTOKEN=71213808
| pages = 253–264
}}
#* {{cite journal
| id = Abadi
| last1 = Daniel
| first1 = J. Abadi
| title = Data Management in the Cloud: Limitations and Opportunities
| periodical = IEEE
| layurl = http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=867310AB38EE46A5E505E698E2F8C82F?doi=10.1.1.178.200&rep=rep1&type=pdf
| url = ftp://131.107.65.22/pub/debull/A09mar/abadi.pdf
| year = 2009
}}
#* {{cite journal
| id = Vogels
| last1 = Ari
| first1 = Juels
| last2 = Alina
| first2 = Oprea
| title = New Approaches to Security and Availability for Cloud Data
| periodical = Communications of the ACM CACM Volume 56 Issue 2
| layurl = http://www.sigmod.org/
| year = 2013
| url = http://dl.acm.org/ft_gateway.cfm?id=2408793&ftid=1338744&dwn=1&CFID=385102978&CFTOKEN=71213808
| doi = 10.1145/2408776.2408793
| pages = 64–73
}}
#* {{cite journal
| id = Vogels
| last1 = Ari
| first1 = Juels
| last2 = S.
| first2 = Burton
| last3 = Jr
| first3 = Kaliski
| title = Pors: proofs of retrievability for large files
| periodical = Communications of the ACM CACM Volume 56 Issue 2
| layurl = http://www.acm.org/sigs/sigsac/ccs
| year = 2007
| url = http://dl.acm.org/ft_gateway.cfm?id=1315317&ftid=476752&dwn=1&CFID=385102978&CFTOKEN=71213808
| doi = 10.1145/1315245.1315317
| pages = 584–597
}}
#* {{cite journal
| id = Ari
| last1 = Ari
| first1 = Ateniese
| last2 = Randal
| first2 = Burns
| last3 = Johns
| first3 = Reza
| last4 = Curtmola
| first4 = Joseph
| last5 = Herring
| first5 = Burton
| last6 = Lea
| first6 = Kissner
| last7 = Zachary
| first7 = Peterson
| last8 = Dawn
| first8 = Song
| title = PDP: Provable data possession at untrusted stores
| periodical = CCS '07 Proceedings of the 14th ACM conference on Computer and communications security
| layurl = http://www.acm.org/sigs/sigsac/ccs
| year = 2007
| url = http://dl.acm.org/ft_gateway.cfm?id=1315318&ftid=481834&dwn=1&CFID=385102978&CFTOKEN=71213808
| doi = 10.1145/1315245.1315318
| pages = 598–609
}}
#* {{cite journal
| id = Giuseppe
| last1 = Giuseppe
| first1 = Ateniese
| last2 = Roberto
| first2 = Di Pietro
| last3 = Luigi
| first3 = V. Mancini
| last4 = Gene
| first4 = Tsudik
| title = SPDP: Scalable and efficient provable data possession
| periodical = Proceedings of the 4th international conference on Security and privacy in communication netowrks Article No. 9
| layurl = http://www.securecomm.org/2008/
| year = 2008
| url = http://dl.acm.org/ft_gateway.cfm?id=1460889&ftid=547362&dwn=1&CFID=385102978&CFTOKEN=71213808
| doi = 10.1145/1460877.1460889
}}
#* {{cite journal
| id = Vogels
| last1 = Chris
| first1 = Erway
| last2 = Alptekin
| first2 = Küpçü
| last3 = Charalampos
| first3 = Papamanthou
| last4 = Roberto
| first4 = Tamassia
| title = Dynamic provable data possession
| periodical = CCS '09 Proceedings of the 16th ACM conference on Computer and communications security
| layurl = http://www.sigsac.org/ccs/CCS2009/
| year = 2009
| url = http://dl.acm.org/ft_gateway.cfm?id=1653688&ftid=707975&dwn=1&CFID=385102978&CFTOKEN=71213808
| doi = 10.1145/1653662.1653688
| pages = 213–222
}}
#synchronization
#* {{cite journal
| id = Uppoor
| last1 = S.
| first1 = Uppoor
| last2 = M.D.
| first2 = Flouris
| last3 = A.
| first3 = Bilas
| title = Cloud-based Synchronization of Distributed File System Hierarchies
| periodical = Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on
| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=5606060
| year = 2010
| url = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=5613087
| doi = 10.1109/CLUSTERWKSP.2010.5613087
| pages = 1–4
| s2cid = 14577793
| others =Inst. of Comput. Sci. (ICS), Found. for Res. & Technol. - Hellas (FORTH), Heraklion, Greece
| chapter = Cloud-based synchronization of distributed file system hierarchies
}}
| isbn = 978-1-4244-8395-2
#Economfirstic aspects
}}
# Economic aspects
#* {{cite journal
| idlast1 = KaufmanLori M.
| last1 first1 = Lori M. Kaufman
| s2cid = 16233643
| first1 = Kaufman
| title = Data Security in the World of Cloud Computing
| periodicaljournal = =IEEE Security & Privacy, IEEE (Volume:7 , Issue: 4 )
| year = 2009
| layurl = http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8013
| year doi = 10.1109/MSP.2009.87
| pages = 161–64
| url = http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5189563
| volume=7
| doi = 10.1109/MSP.2009.87
| pagesissue = 161–64 4
}}
#* {{cite journalconference
| idlast1 = KaufmanMarston
| last1 first1 = ZhiSean
| first1 last2 = Lia
| first2 = Zhi
| last2 = Subhajyoti
| first2 last3 = Bandyopadhyaya
| last3 first3 = JuhengSubhajyoti
| first3 last4 = Zhanga
| last4 first4 = Anand Juheng
| last5 = Ghalsasi
| first4 = Ghalsasib
| first5 = Anand
| title = Cloud computing — The business perspective
| title = Cloud computing — The business perspective
| periodical = Decision Support Systems Volume 51, Issue 1,
| conference = Decision Support Systems Volume 51, Issue 1
| layurl = http://www.sciencedirect.com/science/journal/01679236
| year = 2011
| doi = 10.1016/j.dss.2010.12.006
| url = http://www.sciencedirect.com/science/article/pii/S0167923610002393/pdfft?md5=835bb861541e9a39e05eb6feffa79fdf&pid=1-s2.0-S0167923610002393-main.pdf
| pages = 176–189
| doi = 10.1016/j.dss.2010.12.006
| pages = 176–189
}}
#* {{cite journal
| id = Angabini
| last1 = A.
| first1 = Angabini
| last2 = N.
| first2 = Yazdani
| last3 = T.
| first3 = Mundt
| last4 = F.
| first4 = Hassani
| title = Suitability of Cloud Computing for Scientific Data Analyzing Applications; An Empirical Study
| periodical = P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2011 International Conference on
| layurl =http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6099686
| year = 2011
| url = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/stamp/stamp.jsp?tp=&arnumber=6103158
| doi = 10.1109/3PGCIC.2011.37
|others= Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
| pages =193–199
}}
#* {{cite book
| last1 = Angabini
| first1 = A
| title = 2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing
| last2 = Yazdani
| first2 = N
| last3 = Mundt
| first3 = T
| last4 = Hassani
| first4 = F
| year = 2011
| doi = 10.1109/3PGCIC.2011.37
| s2cid = 13393620
| pages =193–199
| chapter = Suitability of Cloud Computing for Scientific Data Analyzing Applications; an Empirical Study
| isbn = 978-1-4577-1448-1
}}
 
{{Cloud computing}}
 
[[Category:Cloud storage]]
{{Uncategorized|date=December 2013}}
[[Category:Cloud computing]]