Apache Cassandra: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:19, 22 August 2014 edit Textractor (talk \| contribs) 110 edits added Druid data store ← Previous edit		Latest revision as of 15:03, 5 August 2025 edit undo 66.150.142.108 (talk) →Releases
(548 intermediate revisions by more than 100 users not shown)
Line 1: {{short description\|Free and open-source database management system}} {{External links\|date=October 2023}} {{Use mdy dates\|date=September 2024}} {{Use American English\|date=September 2024}} {{Infobox software \| name = Apache Cassandra \| logo = [[File:Cassandra logo.svg\|frameless\|Cassandra logo]] \| author = Avinash Lakshman, Prashant Malik / [[Facebook]] ~~\| screenshot =~~ ~~\| caption =~~ ~~\| author = Avinash Lakshman, Prashant Malik~~ \| developer = [[Apache Software Foundation]] \| released = {{Start date and age\|2008\|07}} \| latest release version = {{wikidata\|property\|edit\|reference\|P548=Q2804309\|P348}} ~~\| status = Active~~ \| latest release date = {{start date and age\|{{wikidata\|qualifier\|mdy\|P548=Q2804309\|P348\|P577}}}} ~~\| latest release version = 2.0.9~~ ~~\| latest release date = {{release date\|2014\|06\|30}}~~ ~~\| frequently updated = yes~~ \| programming language = [[Java (programming language)\|Java]] \| operating system = [[Cross-platform]] \| language = English \| genre = [[NoSQL]] [[Database]], [[data store]] \| license = [[Apache License 2.0]] ~~\| website = {{URL\|http://cassandra.apache.org/}}~~ }} '''Apache Cassandra''' is a [[free and open-source software\|free and open-source]] [[database management system]] designed to handle large volumes of data across multiple [[Commodity computing\|commodity servers]]. The system prioritizes availability and [[scalability]] over [[consistency (database systems)\|consistency]], making it particularly suited for systems with high write throughput requirements due to its [[Log-structured merge-tree\|LSM tree]] indexing storage layer.<ref name="carpenter2022">{{cite book \|last1=Carpenter \|first1=Jeff \|last2=Hewitt \|first2=Eben \|title=Cassandra: The Definitive Guide \|edition=3rd \|publisher=[[O'Reilly Media]] \|year=2022 \|isbn=978-1-4920-9710-5 \|pages=}}</ref> As a [[wide column store\|wide-column database]], Cassandra supports flexible schemas and efficiently handles data models with numerous sparse columns. The system is optimized for applications with well-defined data access patterns that can be incorporated into the schema design.<ref name="carpenter2022" /> Cassandra supports [[computer cluster]]s which may span multiple [[data center]]s,<ref>{{cite web \|access-date=2013-07-25 \|first=Joaquin \|last=Casares \|date=2012-11-05 \|publisher=DataStax \|title=Multi-datacenter Replication in Cassandra \|quote=Cassandra's innate datacenter concepts are important as they allow multiple workloads to be run across multiple datacenters... \|url=http://www.datastax.com/dev/blog/multi-datacenter-replication}}</ref> featuring [[Asynchrony (computer programming)\|asynchronous]] and masterless replication. It enables [[Latency (engineering)\|low-latency]] operations for all clients and incorporates [[Amazon (company)\|Amazon]]'s [[Dynamo (storage system)\|Dynamo]] [[distributed storage]] and replication techniques, combined with [[Google]]'s [[Bigtable]] data storage engine model.<ref>{{cite web \|url=https://cassandra.apache.org/doc/latest/architecture/overview.html \|title=Apache Cassandra Documentation Overview \|access-date=2021-01-21}}</ref> ~~'''Apache Cassandra''' is an [[open source software\|open source]] [[distributed database\|distributed]] [[database management system]] designed to handle large amounts of data across many commodity~~ servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,<ref>{{cite web \|accessdate=2013-07-25 \|first=Joaquin \|last=Casares \|date=2012-11-05 \|publisher=DataStax \|title=Multi-datacenter Replication in Cassandra \|quote=Cassandra’s innate datacenter concepts are important as they allow multiple workloads to be run across multiple datacenters… \|url=http://www.datastax.com/dev/blog/multi-datacenter-replication}}</ref> with asynchronous masterless replication allowing low latency operations for all clients. == History == Cassandra also places a high value on performance. In 2012, University of Toronto researchers studying [[NoSQL]] systems concluded that "In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments."<ref>{{cite web \|accessdate=2013-07-25 \|first1=Tilmann \|last1=Rabl \|first2=Mohammad \|last2=Sadoghi \|first3=Hans-Arno \|last3=Jacobsen \|first4=Sergio Gomez-\|last4=Villamor \|first5=Victor Muntes \|last5=Mulero -\|first6=Serge \|last6=Mankovskii \|date=2012-08-27 \|publisher=VLDB \|title=Solving Big Data Challenges for Enterprise Application Performance Management \|quote=In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments... \|url=http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf}}</ref> Avinash Lakshman, a co-author of [[Amazon (company)\|Amazon]]'s [[Dynamo (storage system)\|Dynamo]], and Prashant Malik developed Cassandra at [[Facebook]] to support the [[inbox]] [[Search engine\|search]] functionality. Facebook released Cassandra as open-source software on [[Google Code]] in July 2008.<ref name=JH2008>{{cite web \|access-date= 2009-06-04 \|date= July 12, 2008 \|url= http://perspectives.mvdirona.com/2008/07/12/FacebookReleasesCassandraAsOpenSource.aspx \|title= Facebook Releases Cassandra as Open Source \|first= James \|last= Hamilton}}</ref> In March 2009, it became an Apache Incubator project<ref>{{cite web \|date=2009-03-02 \|title=Is this the new hotness now? \|url=http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg00004.html \|url-status=live \|archive-url=https://web.archive.org/web/20100425071855/http://www.mail-archive.com/cassandra-dev%40incubator.apache.org/msg00004.html \|archive-date=25 April 2010 \|access-date=2010-03-29 \|publisher=Mail-archive.com}}</ref> and on February 17, 2010, it graduated to a top-level project.<ref name=GRAD>{{cite web\|url=http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01518.html \|title=Cassandra is an Apache top level project \|publisher=Mail-archive.com \|date=2010-02-18 \|access-date=2010-03-29 \|archive-url=https://web.archive.org/web/20100328090322/http://www.mail-archive.com/cassandra-dev%40incubator.apache.org/msg01518.html \|archive-date=28 March 2010 \|url-status=live }}</ref> The developers at [[Facebook]] named their database after [[Cassandra]], the [[mythological]] [[Troy\|Trojan]] prophetess, referencing her curse of making prophecies that were never believed.<ref>{{cite web \|url= http://kellabyte.com/2013/01/04/the-meaning-behind-the-name-of-apache-cassandra/ \|archive-url= https://web.archive.org/web/20161101091045/http://kellabyte.com/2013/01/04/the-meaning-behind-the-name-of-apache-cassandra \|archive-date= 2016-11-01 \|title= The meaning behind the name of Apache Cassandra \|access-date= 2016-07-19 \|quote= Apache Cassandra is named after the Greek mythological prophet Cassandra. [...] Because of her beauty Apollo granted her the ability of prophecy. [...] When Cassandra of Troy refused Apollo, he put a curse on her so that all of her and her descendants' predictions would not be believed. [...] Cassandra is the cursed Oracle[.] \|url-status= dead }}</ref> ~~Cassandra's data model is a partitioned row store with tunable~~ consistency.<ref name="tunable_consistency">{{cite web \|accessdate=2013-07-25 \|author=DataStax \|authorlink=DataStax \|date=2013-01-15 \|title=About data consistency \|url=http://www.datastax.com/docs/1.2/dml/data_consistency}}</ref> Rows are organized into [[Table (database)\|tables]]; the first component of a table's primary key is the partition key; within a partition, rows are [[Clustered index\|clustered]] by the remaining columns of the key.<ref>{{cite web \|accessdate=2013-07-25 \|first=Jonathan \|last=Ellis \|date=2012-02-15 \|title=Schema in Cassandra 1.1 \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/schema-in-cassandra-1-1}}</ref> Other columns may be indexed separately from the primary key.<ref>{{cite web \|accessdate=2013-07-25 \|first=Jonathan \|last=Ellis \|date=2010-12-03 \|title=What’s new in Cassandra 0.7: Secondary indexes \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes}}</ref> == Features and limitations == Tables may be created, dropped, and altered at runtime without blocking updates and queries.<ref>{{cite web \|accessdate=2013-07-25 \|first=Jonathan \|last=Ellis \|date=2012-03-02 \|title=The Schema Management Renaissance in Cassandra 1.1 \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/the-schema-management-renaissance}}</ref> Cassandra uses a [[distributed architecture]] where all nodes perform identical functions, eliminating single points of failure. The system employs configurable replication strategies to distribute data across clusters, providing redundancy and disaster recovery capabilities. The system is capable of linear scaling, which increases read and write throughput with the addition of new nodes, while maintaining continuous service. Cassandra is categorized as an AP ([[Availability (system)\|Availability]] and Partition Tolerance) system, emphasizing availability and partition tolerance over [[Consistency (database systems)\|consistency]]. While it offers tunable consistency levels for both read and write operations, its architecture makes it less suitable for use cases requiring strict consistency guarantees.<ref name="carpenter2022" /> Additionally, Cassandra's compatibility with [[Apache Hadoop\|Hadoop]] and related tools allows for integration with existing big data processing workflows. Eventual consistency is maintained using [[Tombstone (data store)\|tombstones]] to manage reads, [[UPSERT\|upserts]], and deletes. Cassandra does not support [[Join (SQL)\|joins]] or [[Correlated subquery\|subqueries]], except for batch analysis via [[Apache Hadoop\|Hadoop]]. Rather, Cassandra emphasizes [[denormalization]] through features like collections.<ref>{{cite web \|accessdate=2013-07-25 \|first=Sylvain \|last=Lebresne \|date=2012-08-05 \|title=Coming in 1.2: Collections support in CQL3 \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/cql3_collections}}</ref> The system's query capabilities have notable limitations. Cassandra does not support advanced query patterns such as multi-table [[Join (SQL)\|JOINs]], ad hoc aggregations, or complex queries.<ref name="carpenter2022" /> These limitations stem from its distributed architecture, which optimizes for scalability and availability rather than complex query operations. ~~== History ==~~ Apache Cassandra was developed at [[Facebook]] to power their Inbox Search feature by Avinash Lakshman (one of the authors of [[Dynamo (storage system)\|Amazon's Dynamo]]) and Prashant Malik. It was released as an open source project on [[Google code]] in July 2008.<ref name=JH2008>{{cite web \|accessdate=2009-06-04 \|date=July 12, 2008 \|url=http://perspectives.mvdirona.com/2008/07/12/FacebookReleasesCassandraAsOpenSource.aspx \|title=Facebook Releases Cassandra as Open Source \|first=James \|last=Hamilton}}</ref> In March 2009, it became an [[Apache Incubator]] project.<ref>{{cite web\|url=http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg00004.html \|title=Is this the new hotness now? \|publisher=Mail-archive.com \|date=2009-03-02 \|accessdate=2010-03-29\|archiveurl= http://web.archive.org/web/20100425071855/http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg00004.html\|archivedate= 25 April 2010 <!--DASHBot-->\|deadurl= no}}</ref> On February 17, 2010 it graduated to a top-level project.<ref name=GRAD>{{cite web\|url=http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01518.html \|title=Cassandra is an Apache top level project \|publisher=Mail-archive.com \|date=2010-02-18 \|accessdate=2010-03-29\|archiveurl= http://web.archive.org/web/20100328090322/http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01518.html\|archivedate= 28 March 2010 <!--DASHBot-->\|deadurl= no}}</ref> == Data model == ~~Releases after graduation include~~ As a [[wide column store\|wide-column store]], Cassandra combines features of both key-value and tabular database systems. It implements a partitioned row store model with adjustable consistency levels.<ref name="tunable_consistency">{{cite web \|access-date=2013-07-25 \|author=DataStax \|author-link=DataStax \|date=2013-01-15 \|title=About data consistency \|url=http://www.datastax.com/docs/1.2/dml/data_consistency \|archive-url=https://web.archive.org/web/20130726185743/http://www.datastax.com/docs/1.2/dml/data_consistency \|archive-date=2013-07-26 \|url-status=dead }}</ref> The following table compares Cassandra and [[relational database management systems]] (RDBMS). 0.6, released Apr 12 2010, added support for integrated caching, and [[Apache Hadoop]] [[MapReduce]]<ref>[https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces3 The Apache Software Foundation Announces Apache Cassandra Release 0.6 : The Apache Software Foundation Blog<!-- Bot generated title -->]</ref> 0.7, released Jan 08 2011, added secondary indexes and online schema changes<ref>[https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces9 The Apache Software Foundation Announces Apache Cassandra 0.7 : The Apache Software Foundation Blog<!-- Bot generated title -->]</ref> 0.8, released Jun 2 2011, added the Cassandra Query Language (CQL), self-tuning memtables, and support for zero-downtime upgrades<ref>[http://grokbase.com/t/cassandra/user/1162fkpwx2/release-0-8-0 [Cassandra-user] [RELEASE] 0.8.0 - Grokbase<!-- Bot generated title -->]</ref> 1.0, released Oct 17 2011, added integrated compression, leveled compaction, and improved read performance<ref>[http://www.infoq.com/news/2011/10/Cassandra-1 Cassandra 1.0.0. Is Ready for the Enterprise<!-- Bot generated title -->]</ref> 1.1, released Apr 23 2012, added self-tuning caches, row-level isolation, and support for mixed ssd/spinning disk deployments<ref>[https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces26 The Apache Software Foundation Announces Apache Cassandra™ v1.1 : The Apache Software Foundation Blog<!-- Bot generated title -->]</ref> 1.2, released Jan 2 2013, added clustering across virtual nodes, inter-node communication, atomic batches, and request tracing<ref>[https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces38 The Apache Software Foundation Announces Apache Cassandra™ v1.2]</ref> 2.0, released Sep 4 2013, added lightweight transactions (based on the [[Paxos (computer science)\|Paxos]] consensus protocol), triggers, improved compactions 2.0.4, released Dec 30 2013, added allowing specifying datacenters to participate in a repair, client encryption support to sstableloader, allow removing snapshots of no-longer-existing CFs<ref>[http://qnalist.com/questions/4662083/release-apache-cassandra-2-0-4 [Cassandra-user] [RELEASE] Apache Cassandra 2.0.4]</ref> {\| class="wikitable" ~~== Licensing and support ==~~ \|+ Data Model Comparison: Cassandra vs RDBMS ! Feature !! Cassandra !! RDBMS \|- \| Organization \|\| Keyspace → Table → Row \|\| Database → Table → Row \|- \| Row Structure \|\| Dynamic columns \|\| Fixed schema \|- \| Column Data \|\| Name, type, value, timestamp \|\| Name, type, value \|- \| Schema Changes \|\| Runtime modifications \|\| Usually requires downtime \|- \| Data Model \|\| Denormalized \|\| Normalized with JOINs \|} The data model consists of several hierarchical components: ~~Apache Cassandra is an Apache Software Foundation project, so it has an [[Apache License\|Apache License (version 2.0)]].~~ === ~~Main features~~Keyspace === A keyspace in Cassandra is analogous to a database in [[relational database management system\|relational systems]]. It contains multiple tables and manages configuration information, including replication strategy and user-defined types (UDTs).<ref name="carpenter2022" /> === Tables === ~~; Decentralized~~ Tables (formerly called [[Column family\|column families]] prior to CQL 3) are containers for rows of data. Each table has a name and configuration information for its stored data. Tables may be created, dropped, or altered at run-time without blocking [[Update (SQL)\|updates]] and queries.<ref>{{cite web \|access-date=2013-07-25 \|first=Jonathan \|last=Ellis \|date=2012-03-02 \|title=The Schema Management Renaissance in Cassandra 1.1 \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/the-schema-management-renaissance}}</ref> : Every node in the cluster has the same role. There is '''no single point of failure'''. Data is distributed across the cluster (so each node contains different data), but there is no master as every node can service any request. === Rows and columns === ~~; Supports replication and multi data center replication~~ Each row is identified by a [[primary key]] and contains columns. The first component of a table's primary key is the partition key; within a partition, rows are [[Clustered index\|clustered]] by the remaining columns of the key.<ref>{{cite web \|access-date=2013-07-25 \|first=Jonathan \|last=Ellis \|date=2012-02-15 \|title=Schema in Cassandra 1.1 \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/schema-in-cassandra-1-1}}</ref> : Replication strategies are configurable.<ref>[http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers "Deploying Cassandra across Multiple Data Centers" article on Datastax Cassandra Developer Center]</ref> Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra’s distributed architecture are specifically tailored for multiple-data center deployment, for redundancy, for failover and disaster recovery. Columns contain data belonging to a row and consist of: ~~; Scalability~~ * A name ~~: Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.~~ * A type * A value * Timestamp metadata (used for write conflict resolution via "last write wins") Unlike traditional RDBMS tables, rows within the same table can have varying columns, providing a flexible structure. This flexibility distinguishes Cassandra from relational databases, as not all columns need to be specified for each row.<ref name="carpenter2022" /> Other columns may be indexed separately from the primary key.<ref>{{cite web \|access-date=2013-07-25 \|first=Jonathan \|last=Ellis \|date=2010-12-03 \|title=What's new in Cassandra 0.7: Secondary indexes \|publisher=DataStax \|url=http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes}}</ref> ~~; Fault-tolerant~~ : Data is automatically replicated to multiple nodes for [[fault-tolerance]]. [[Replication (computer science)\|Replication]] across multiple data centers is supported. Failed nodes can be replaced with no downtime. == Storage model == ~~; Tunable consistency~~ Cassandra uses a [[Log-structured merge-tree\|Log Structured Merge Tree (LSM tree)]] index to optimize write throughput, in contrast to the [[B tree indexing\|B-tree indexes]] used by most databases.<ref name="carpenter2022" /> : Writes and reads offer a tunable level of consistency, all the way from "writes never fail" to "block for all replicas to be readable", with the [[Quorum (distributed computing)\|quorum level]] in the middle.<ref name="tunable_consistency" /> {\| class="wikitable" ~~; MapReduce support~~ \|+ Storage Model Comparison: Cassandra vs RDBMS : Cassandra has [[Hadoop]] integration, with [[MapReduce]] support. There is support also for [[Pig (programming tool)\|Apache Pig]] and [[Apache Hive]].<ref name="hadoopsupport">[http://wiki.apache.org/cassandra/HadoopSupport "Hadoop Support"] article on Cassandra's wiki</ref> ! Feature !! Cassandra !! RDBMS \|- \| Index Structure \|\| LSM Tree \|\| B-Tree \|- \| Write Process \|\| Append-only with Memtable \|\| In-place updates \|- \| Storage Components \|\| Commit Log, Memtable, SSTable \|\| Data files, Transaction Log \|- \| Update Strategy \|\| New entry for each change \|\| Modify existing data \|- \| Delete Handling \|\| Tombstone markers \|\| Direct removal \|- \| Read Optimization \|\| Secondary \|\| Primary \|- \| Write Optimization \|\| Primary \|\| Secondary \|} The storage architecture consists of three main components:<ref name="carpenter2022" /> ~~; Query language~~ : Cassandra introduces CQL ([[Cassandra Query Language]]), a SQL-like alternative to the traditional RPC interface. Language drivers are available for '''Java''' (JDBC), '''Python''' (DBAPI2), '''Node.JS''' (Helenus) and '''Go''' (gocql). === ~~Data~~Core ~~model~~components === * '''Commit Log''': A [[Write-ahead logging\|write-ahead log]] that ensures write durability ~~{{Expand section\|informational details and clarification\|date=September 2012}}~~ * '''Memtable''': An [[In-memory processing\|in-memory]] data structure that stores writes, sorted by primary key ~~Cassandra is essentially a hybrid between a key-value and a column-oriented (or tabular) database.~~ * '''SSTable''' (Sorted String Table): Immutable files containing data flushed from Memtables === Write and read processes === :A [[column family]] resembles a table in an RDBMS. Column families contain rows and columns. Each row is uniquely identified by a row key. Each row has multiple columns, each of which has a name, value, and a timestamp. Unlike a table in an RDBMS, different rows in the same column family do not have to share the same set of columns, and a column may be added to one or multiple rows at any time.<ref>{{cite web\|last=DataStax\|title=Apache Cassandra 0.7 Documentation - Column Families\|url=http://www.datastax.com/docs/0.7/data_model/column_families#column-families\|work=Apache Cassandra 0.7 Documentation\|accessdate=29 October 2012}}</ref> Write operations follow a two-stage process: # The write is recorded in the commit log and added to the Memtable # When the Memtable reaches size or time thresholds, it flushes to an SSTable Read operations: ~~Each key in Cassandra corresponds to a value which is an object. Each key has values as columns, and columns are grouped together into sets called column families.~~ # Check Memtable for latest data # Search SSTables from newest to oldest using bloom filters for efficiency === Data management === Thus, each key identifies a row of a variable number of elements. These column families could be considered then as tables. A table in Cassandra is a distributed multi dimensional map indexed by a key. ==== Tombstones ==== ~~Furthermore, applications can specify the sort order of columns within a Super Column or Simple Column family.~~ Every operation (create/update/delete) generates a new entry, with deletes handled via "[[Tombstone (data store)\|tombstones]]". While common in many databases, tombstones can cause performance degradation in delete-heavy workloads.<ref>{{cite web \|last1=Rodriguez \|first1=Alain \|title=About Deletes and Tombstones in Cassandra \|url=https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html \|date=27 Jul 2016}}</ref> ==== Compaction ==== ~~==Clustering==~~ Compaction consolidates multiple SSTables to: When the cluster for Apache Cassandra is designed, an important point is to select the right partitioner. Two partitioners exist:<ref>{{cite web \|accessdate=2011-03-23 \|first=Dominic \|last=Williams \|___location=http://wordpress.com/ \|publisher=WordPress.com \|title=Cassandra: RandomPartitioner vs OrderPreservingPartitioner \|quote=When building a Cassandra cluster, the “key” question (sorry, that’s weak) is whether to use the RandomPartitioner (RP), or the OrderPreservingPartitioner (OPP). These control how your data is distributed over your nodes. Once you have chosen your partitioner, you cannot change without wiping your data, so think carefully! The problem with OPP: If the distribution of keys used by individual column families is different, their sets of keys will not fall evenly across the ranges assigned to nodes. Thus nodes will end up storing preponderances of keys (and the associated data) corresponding to one column family or another. If as is likely column families store differing quantities of data with their keys, or store data accessed according to differing usage patterns, then some nodes will end up with disproportionately more data than others, or serving more “hot” data than others. \|url=http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/}}</ref> * Reduce storage usage # RandomPartitioner (RP): This partitioner randomly distributes the key-value pairs over the network, resulting in a good load balancing. Compared to OPP, more nodes have to be accessed to get a number of keys. * Remove deleted row tombstones # OrderPreservingPartitioner (OPP): This partitioner distributes the key-value pairs in a natural way so that similar keys are not far away. The advantage is that fewer nodes have to be accessed. The drawback is the uneven distribution of the key-value pairs. * Improve read performance == ~~Prominent~~Cassandra ~~users~~Query Language == Cassandra Query Language (CQL) is the interface for accessing Cassandra, as an alternative to the traditional [[SQL\|Structured Query Language]] (SQL). CQL adds an [[abstraction layer]] that hides implementation details of this structure and provides native syntaxes for collections and other common encodings. Language drivers are available for [[Java (programming language)\|Java]] ([[Java Database Connectivity\|JDBC]]), [[Python (programming language)\|Python]] (DBAPI2), [[Node.js\|Node.JS]] ([[DataStax]]), [[Go (programming language)\|Go]] (gocql), and [[C++]].<ref>{{cite web \|title=DataStax C/C++ Driver for Apache Cassandra \|url=https://github.com/datastax/cpp-driver \|access-date=15 December 2014 \|work=DataStax}}</ref> ~~{{Weasel\|section\|date=October 2013}}~~ [[@WalmartLabs]]<ref>[http://www.walmartlabs.com Walmart Labs]</ref> (previously [[Kosmix]]) uses Cassandra with SSD [[AppScale]] uses Cassandra as a back-end for Google App Engine applications<ref>{{cite web\|title=Datastores on Appscale \|url=http://appscale.cs.ucsb.edu/datastores.html#cassandra}}</ref> [[CERN]] uses Cassandra for its [[ATLAS experiment]] to archive the online DAQ system's monitoring information<ref name=CERN-ATLAS>{{cite web\|url=https://cdsweb.cern.ch/record/1432912 \|title=A Persistent Back-End for the ATLAS Online Information Service (P-BEAST)}}</ref> [[Cisco]]'s [[WebEx]] uses Cassandra to store user feed and activity in near real time.<ref name=CISCO>{{cite web\|url=http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01163.html \|title=Re: Cassandra users survey \|publisher=Mail-archive.com \|date=2009-11-21 \|accessdate=2010-03-29\|archiveurl= http://web.archive.org/web/20100417083733/http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01163.html\|archivedate= 17 April 2010 <!--DASHBot-->\|deadurl= no}}</ref> [[Cloudkick]] uses Cassandra to store the server metrics of their users.<ref>[https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ 4 Months with Cassandra, a love story \|Cloudkick, manage servers better<!-- Bot generated title -->]</ref> [[Constant Contact]] uses Cassandra in their email and social media marketing applications.<ref name=ConstantContact>{{cite web \|url=http://www.readwriteweb.com/enterprise/2011/02/this-week-in-consolidation-hp.php \|title=This Week in Consolidation: HP Buys Vertica, Constant Contact Buys Bantam Live and More \|first=Klint \|last=Finley \|publisher=Read Write Enterprise \|date=2011-02-18}}</ref> Over 200 nodes are deployed. [[Digg]], a large social news website, announced on Sep 9th, 2009 that it is rolling out its use of Cassandra<ref name=NM2009>{{cite web \|url=http://blog.digg.com/?p=966 \|title=Looking to the future with Cassandra \|first=Ian \|last=Eure}}</ref> and confirmed this on March 8, 2010.<ref name=DG2010>{{cite web \|url=http://about.digg.com/node/564 \|title=Saying Yes to NoSQL; Going Steady with Cassandra \|first=John \|last=Quinn}}</ref> [[TechCrunch]] has since linked Cassandra to Digg v4 reliability criticisms and recent company struggles.<ref name=ES2010>{{cite web\|url=http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/\|title=As Digg Struggles, VP Of Engineering Is Shown The Door\|first=Erick \|last=Schonfeld}}</ref> Lead engineers at Digg later rebuked these criticisms as red herring and blamed a lack of load testing.<ref name=QU2010>{{cite web\|url=http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures/\|title=Is Cassandra to Blame for Digg v4's Failures?}}</ref> [[Facebook]] used Cassandra to power Inbox Search, with over 200 nodes deployed.<ref name=FBIS>{{cite web\|url=http://www.facebook.com/note.php?note_id=24413138919&id=9445547199&index=9 \|title=Niet compatibele browser \|publisher=Facebook \|date= \|accessdate=2010-03-29}}</ref> This was abandoned in late 2010 when they built Facebook Messaging platform on [[HBase]].<ref name=KM2010>{{cite web \|url=http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919 \|title=The Underlying Technology of Messages \|first=Kannan \|last=Muthukkaruppan}}</ref> [[Formspring]] uses Cassandra to count responses, as well as store Social Graph data (followers, following, blockers, blocking) for 26 Million accounts with 10 million responses a day<ref>{{cite web\|url=http://www.slideshare.net/martincozzi/cassandra-formspring \|date=2011-08-31 \|first=Martin \|last=Cozzi \|title=Cassandra at Formspring}}</ref> [[IBM]] has done research in building a scalable email system based on Cassandra.<ref name=IBM>{{cite web\|url=http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf \|title=BlueRunner: Building an Email Service in the Cloud \|publisher=ieee.org \|date=2009-07-20 \|accessdate=2010-03-29}}</ref> [[Mahalo.com]] uses Cassandra to record user activity logs and topics for their Q&A website<ref name="Mahalo">{{cite web\|title=Mahalo.com powered by Apache Cassandra™ \|date=2012-04-10 \|accessdate=2014-06-13 \|website=DataStax.com \|publisher=[[DataStax]] \|___location=Santa Clara, CA, USA\|url=http://www.datastax.com/wp-content/uploads/2011/06/DataStax-CaseStudy-Mahalo.pdf}}</ref><ref name=Mahalo2>[http://blip.tv/datastax/cassandra-at-mahalo-com-4030941 Watch Cassandra at Mahalo.com \|DataStax Episodes \|Blip<!-- Bot generated title -->]</ref> [[Netflix]] uses Cassandra as their back-end database for their streaming services<ref name="Netflix1">{{cite web\|url=http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra \|title=Migrating Netflix from Datacenter Oracle to Global Cassandra \|first=Adrian \|last=Cockcroft \|date=2011-07-11 \|accessdate=2014-06-13 \|website=slideshare.net}}</ref><ref name=Netflix2>{{cite web\|url=http://techblog.netflix.com/2011/01/nosql-at-netflix.html \|date=2011-01-28 \|first=Yury \|last=Izrailevsky \|title=NoSQL at Netflix}}</ref> [[Ooyala]] Built a scalable, flexible, real-time analytics engine using Cassandra<ref name="Ooyala">{{cite web \|title=Designing a Scalable Database for Online Video Analytics \|url=http://www.datastax.com/wp-content/uploads/2011/04/WP-Ooyala.pdf \|website=DataStax.com \|___location=Mountain View CA, USA \|author=Ooyala \|authorlink=Ooyala \|date=2010-05-18 \|accessdate=2014-06-14}}</ref> [[Openwave]] uses Cassandra as a distributed database and sas a distributed storage mechanism for their next generation messaging platform<ref name="Openwave">{{cite web \|title=DataStax Case Study of Openwave Messaging \|url=http://www.datastax.com/wp-content/uploads/2011/05/DataStax-CaseStudy-Openwave.pdf \|author=Mainstay LLC \|date=2013-11-11 \|accessdate=2014-06-15 \|website=DataStax.com \|publisher=[[DataStax]] \|___location=Santa Clara, CA, USA}}</ref> [[OpenX (software)\|OpenX]] is running over 130 nodes on Cassandra for their OpenX Enterprise product to store and replicate advertisements and targeting data for ad delivery<ref name=OpenX>[http://openx.com/publisher/technology Ad Serving Technology - Advanced Optimization, Forecasting, & Targeting \|OpenX<!-- Bot generated title -->]</ref> [[Plaxo]] has "reviewed 3 billion contacts in [their] database, compared them with publicly available data sources, and identified approximately 600 million unique people with contact info."<ref name=Plaxo>{{cite web\|url=http://blog.plaxo.com/2011/03/an-important-milestone-and-its-only-the-beginning/ \|title=An important milestone - and it's only the beginning! \|date=2011-03-20 \|first=Preston \|last=Smalley}}</ref> [[PostRank]] uses Cassandra as their backend database<ref name=PostRank>{{cite web\|url=http://blog.postrank.com/2011/03/webpulp-tv-scaling-postrank-with-ilya-grigorik/ \|first=Ilya \|last=Grigorik \|date=2011-03-29 \|title=Webpulp TV: Scaling PostRank with Ilya Grigorik}}</ref> [[Rackspace]] is known to use Cassandra internally.<ref name=Rackspace>{{cite web\|url=http://www.slideshare.net/stuhood/hadoop-and-cassandra-at-rackspace \|title=Hadoop and Cassandra (at Rackspace) \|publisher=Stu Hood \|date=2010-04-23 \|accessdate=2011-09-01}}</ref> [[Reddit]] switched to Cassandra from [[memcacheDB]] on March 12, 2010<ref name=REDDIT>{{cite web\|author=Posted by david [ketralnis] \|url=http://blog.reddit.com/2010/03/she-who-entangles-men.html \|title=what's new on reddit: She who entangles men \|publisher=blog.reddit \|date=2010-03-12 \|accessdate=2010-03-29\|archiveurl= http://web.archive.org/web/20100325115755/http://blog.reddit.com/2010/03/she-who-entangles-men.html\|archivedate= 25 March 2010 <!--DASHBot-->\|deadurl= no}}</ref> and experienced some problems in May due to insufficient nodes in their cluster.<ref name=REDDIT2>{{cite web\|author= Posted by the reddit admins at \|url=http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html \|title=blog.reddit -- what's new on reddit: reddit's May 2010 "State of the Servers" report \|publisher=blog.reddit \|date=2010-05-11 \|accessdate=2010-05-16\|archiveurl= http://web.archive.org/web/20100514085008/http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html\|archivedate= 14 May 2010 <!--DASHBot-->\|deadurl= no}}</ref> [[RockYou]] uses Cassandra to record every single click for 50 million Monthly Active Users in real-time for their online games<ref name=RockYou>{{cite web \|url=http://mysqldba.blogspot.com/2010/03/cassandra-is-my-nosql-solution-but.html \|date=2011-03-23 \|first=Dathan Vance \|last=Pattishall \|title=Cassandra is my NoSQL Solution but}}</ref> [[SoundCloud]] uses Cassandra to store the dashboard of their users<ref name=SoundCloud>{{cite web\|url=http://backstage.soundcloud.com/2011/04/failing-with-mongodb/\|title=Cassandra at SoundCloud}}</ref> [[Talentica Software]] uses Cassandra as a back-end for Analytics Application with Cassandra cluster of 30 nodes and inserting around 200GB data on daily basis.<ref>cite web\|url=http://www.talentica.com{{Failed verification\|date=June 2014}}</ref>{{Failed verification\|date=June 2014}} [[Twitter]] announced it is planning to use Cassandra because it can be run on large server clusters and is capable of taking in very large amounts of data at a time.<ref name=TWITTER>{{cite web\|last=Popescu \|first=Alex \|url=http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king \|title=Cassandra @ Twitter: An Interview with Ryan King \|publisher=myNoSQL \|date= \|accessdate=2010-03-29\|archiveurl= http://web.archive.org/web/20100301151656/http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king\|archivedate= 1 March 2010 <!--DASHBot-->\|deadurl= no}}</ref><ref name=TWITTER2>{{cite web\|last=Babcock \|first=Charles \|url=http://www.informationweek.com/news/software/open_source/showArticle.jhtml?articleID=223100894&pgno=1&queryText=&isPrev= \|title=Twitter Drops MySQL For Cassandra - Cloud databases \|publisher=InformationWeek \|date= \|accessdate=2010-03-29\|archiveurl= http://web.archive.org/web/20100402075726/http://www.informationweek.com/news/software/open_source/showArticle.jhtml?articleID=223100894&pgno=1&queryText=&isPrev=\|archivedate= 2 April 2010 <!--DASHBot-->\|deadurl= no}}</ref> Twitter continues to use it but not for Tweets themselves.<ref name="King">{{cite web \|url=https://blog.twitter.com/2010/cassandra-twitter-today \|title=Cassandra at Twitter Today \|first=Ryan \|last=King \|website=blog.twitter.com \|date=2010-07-10 \|accessdate=2014-06-20 \|publisher=[[Twitter]] \|___location=San Fransisco, CA, USA}}</ref> [[Urban Airship]] uses Cassandra with the mobile service hosting for over 160 million application installs across 80 million unique devices<ref name=UrbanAirship>{{cite web\|url=http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions \|title=From 100s to 100s of Millions \|first=Erik \|last=Onnen}}</ref> [[Zoho]] uses Cassandra for generating the inbox preview in their [[Zoho#Zoho Mail]] service The key space in Cassandra is a namespace that defines data replication across nodes. Therefore, replication is defined at the key space level. Below is an example of key space creation, including a column family in CQL 3.0:<ref>{{cite web \|title=CQL \|url=https://cassandra.apache.org/doc/cql3/CQL.html \|url-status=dead \|archive-url=https://web.archive.org/web/20160113141740/http://cassandra.apache.org/doc/cql3/CQL.html \|archive-date=13 January 2016 \|access-date=5 January 2016}}</ref><syntaxhighlight lang="mysql" line="1"> [[Facebook]] moved off its pre-Apache Cassandra deployment in late 2010 when they replaced Inbox Search with the Facebook Messaging platform.<ref name="KM2010"/> In 2012, Facebook began using Apache Cassandra in its Instagram unit.<ref>{{cite web \|accessdate=2013-07-25 \|author=Rick Branson \|date=2013-06-26 \|title=Cassandra at Instagram \|publisher=DataStax \|url=http://www.youtube.com/watch?v=xDtclzE4ydA}}</ref> CREATE KEYSPACE MyKeySpace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; USE MyKeySpace; Cassandra is the most popular wide column store.<ref name=DB-Engines>{{cite web\|url=http://db-engines.com/en/ranking/wide+column+store \|title=DB-Engines Ranking of Wide Column Stores \|author=DB-Engines}}</ref> CREATE COLUMNFAMILY MyColumns (id text, lastName text, firstName text, PRIMARY KEY(id)); ~~== See also ==~~ ~~{{Portal\|Free software}}~~ INSERT INTO MyColumns (id, lastName, firstName) VALUES ('1', 'Doe', 'John'); [[Apache Accumulo]] - Secure [[Hadoop\|Apache Hadoop]] based distributed database. [[Berkeley db\|Berkeley DB]] [[DataStax]] [[Druid (open-source data store]] [[MongoDB]] [[BigTable]] - Original distributed database by Google [[Riak]] [[Distributed database]] [[Distributed hash table]] (DHT) [[Dynamo (storage system)]] - Cassandra borrows many elements from Dynamo [[HBase\|Apache HBase]] - [[Hadoop\|Apache Hadoop]] based distributed database. Very similar to BigTable [[Hypertable]] - [[Hadoop\|Apache Hadoop]] based distributed database. Very similar to BigTable [[NoSQL]] SELECT * FROM MyColumns; ~~==References==~~ </syntaxhighlight> ~~{{Reflist\|2}}~~ Which gives: <syntaxhighlight lang="text"> id \| lastName \| firstName ----+----------+---------- 1 \| Doe \| John (1 rows) </syntaxhighlight> == Distributed architecture == === Gossip protocol === Cassandra uses a peer-to-peer gossip protocol for cluster communication. Nodes routinely exchange information about cluster state, including: * Node availability status * Schema versions * Generation timestamps (node bootstrap time) * Version numbers (logical clock values) The system uses [[vector clock]]s to track information currency and ignore outdated state data.<ref name="carpenter2022" /> === Seed nodes === The architecture designates certain nodes as "seed" nodes that: * Bootstrap the cluster * Serve as guaranteed gossip communication points * Prevent cluster fragmentation * Remain discoverable via service discovery methods This design eliminates single points of failure while maintaining cluster-wide consistency of operational knowledge.<ref name="carpenter2022" /> === Fault tolerance === Cassandra employs the Phi Accrual Failure Detector to manage node failures during cluster operation.<ref>{{cite conference \| title = The Φ Accrual Failure Detector \| first1 = Naohiro \| last1 = Hayashibara \| first2 = Xavier \| last2 = Défago \| first3 = Rami \| last3 = Yared \| first4 = Takuya \| last4 = Katayama \| book-title = IEEE Symposium on Reliable Distributed Systems \| year = 2004 \| pages = 66–78 \| doi = 10.1109/RELDIS.2004.1353004 }}</ref> Through this system, each node independently assesses the availability of other nodes during gossip communication. When a node fails to respond, it is "convicted" and removed from write operations, though it can rejoin the cluster upon resuming heartbeat signals.<ref name="carpenter2022" /> To maintain data integrity during node outages, Cassandra uses a "hinted handoff" mechanism. When writing to an offline node, the coordinator node temporarily stores the write data as a "hint." Once the offline node returns to service, these hints are forwarded to restore data consistency. Notably, Cassandra only permanently removes nodes through explicit administrative decommissioning or rebuilding, preventing temporary communication failures or restarts from triggering unnecessary data rebalancing.<ref name="carpenter2022" /> ==Management and monitoring== Cassandra is a Java-based system that can be managed and monitored via [[Java Management Extensions]] (JMX). The JMX-compliant ''Nodetool'' utility, for instance, can be used to manage a Cassandra cluster.<ref>{{cite web\|title=NodeTool\|url=https://wiki.apache.org/cassandra/NodeTool\|website=Cassandra Wiki\|access-date=5 January 2016\|archive-url=https://web.archive.org/web/20160113122938/http://wiki.apache.org/cassandra/NodeTool\|archive-date=13 January 2016\|url-status=dead}}</ref> Nodetool also offers a number of commands to return Cassandra metrics pertaining to disk usage, latency, compaction, garbage collection, and more.<ref>{{cite web\|title=How to monitor Cassandra performance metrics\|date=3 December 2015\|url=https://www.datadoghq.com/blog/how-to-monitor-cassandra-performance-metrics/\|publisher=Datadog\|access-date=5 January 2016}}</ref> Since the release of Cassandra 2.0.2 in 2013, measures of several metrics are produced via the Dropwizard metrics framework,<ref>{{cite web\|title=Metrics\|url=https://wiki.apache.org/cassandra/Metrics\|website=Cassandra Wiki\|access-date=5 January 2016\|archive-date=12 November 2015\|archive-url=https://web.archive.org/web/20151112112756/http://wiki.apache.org/cassandra/Metrics\|url-status=dead}}</ref> and may be queried via JMX using tools such as [[JConsole]] or passed to external monitoring systems via Dropwizard-compatible reporter plugins.<ref>{{cite web\|title=Monitoring\|url=http://cassandra.apache.org/doc/latest/operating/metrics.html\|website=Cassandra Documentation\|access-date=1 February 2018}}</ref> == Releases == Releases after graduation include: {\| class="wikitable" \|- ! Version ! Original release date ! Latest version ! Release date ! Status<ref> {{cite web \|title=Cassandra Server Releases \|url=http://cassandra.apache.org/download/ \|access-date=15 December 2015 \|work=cassandra.apache.org}} </ref> \|- \| {{Version\|o\|0.6}} \| 2010-04-12 \| 0.6.13 \| 2011-04-18 \| No longer maintained \|- \| {{Version\|o\|0.7}} \| 2011-01-10 \| 0.7.10 \| 2011-10-31 \| No longer maintained \|- \| {{Version\|o\|0.8}} \| 2011-06-03 \| 0.8.10 \| 2012-02-13 \| No longer maintained \|- \| {{Version\|o\|1.0}} \| 2011-10-18 \| 1.0.12 \| 2012-10-04 \| No longer maintained \|- \| {{Version\|o\|1.1}} \| 2012-04-24 \| 1.1.12 \| 2013-05-27 \| No longer maintained \|- \| {{Version\|o\|1.2}} \| 2013-01-02 \| 1.2.19 \| 2014-09-18 \| No longer maintained \|- \| {{Version\|o\|2.0}} \| 2013-09-03 \| 2.0.17 \| 2015-09-21 \| No longer maintained \|- \| {{Version\|o\|2.1}} \| 2014-09-16 \| 2.1.22 \| 2020-08-31 \| No longer maintained \|- \| {{Version\|o\|2.2}} \| 2015-07-20 \| 2.2.19 \| 2020-11-04 \| No longer maintained \|- \| {{Version\|o\|3.0}} \| 2015-11-09 \| 3.0.29 \| 2023-05-15 \| No longer maintained \|- \| {{Version\|o\|3.11}} \| 2017-06-23 \| 3.11.15 \| 2023-05-05 \| No longer maintained \|- \| {{Version\|co\|4.0}} \| 2021-07-26 \| 4.0.18 \| 2025-05-28 \| Maintained until 5.1.0 release \|- \| {{Version\|co\|4.1}} \| 2022-06-17 \| 4.1.9 \| 2025-05-19 \| Maintained until 5.2.0 release \|- \| {{Version\|c\|5.0}} \| 2024-09-05 \| 5.0.5 \| 2025-08-05 \| Latest release. Maintained until 5.3.0 release \|- \| colspan="5" \| <small>{{Version\|l\|show=111110}}</small> \|} <!-- o=Old-Not-Supported; co=Old-Still-Supported; c=Latest-Stable; cp=Preview; p=Planned-Future --> == See also == {{Portal\|Free and open-source software}} * [[Bigtable]] – Original distributed database by Google * [[Distributed database]] * [[Distributed hash table]] (DHT) * [[Dynamo (storage system)]] – Cassandra borrows many elements from Dynamo == References == {{Reflist\|30em}} ==Bibliography== {{refbegin}} * {{cite book \| first1 = ~~Eben~~Jeff \| last1 = ~~Hewitt~~Carpenter \| ~~date~~ first2 = ~~December 15, 2010~~Eben \| last2 = Hewitt \| date = January 23, 2022 \| title = Cassandra: The Definitive Guide \| publisher = [[O'Reilly Media]] \| edition = ~~1st~~3rd \| page = ~~300~~432 \| isbn = 978-1-~~4493~~4920-~~9041~~9710-95 ~~\| url = http://oreilly.com/catalog/0636920010852~~ }} * {{cite book \| first1 = Edward \| last1 = Capriolo Line 160 ⟶ 314: \| edition = 1st \| page = 324 \| isbn = 978-1-84951-512-32 \| url = http://www.packtpub.com/cassandra-apache-high-performance-cookbook/book }} * {{cite book \| first1 = Eben \| last1 = Hewitt \| date = December 15, 2010 \| title = Cassandra: The Definitive Guide \| publisher = [[O'Reilly Media]] \| edition = 1st \| page = 300 \| isbn = 978-1-4493-9041-9 \| url = http://shop.oreilly.com/product/0636920010852.do }} {{refend}} ==External links== {{Commons category}} ~~{{external links\|date=January 2012}}~~ {{Wikiversity\|Big Data/Cassandra}} {{cite web \|title=Cassandra - A structured storage system on a P2P Network \|url=https://www.facebook.com/note.php?note_id=24413138919&id=9445547199&index=9 \|first=Avinash \|last=Lakshman {{cite web \|title=Cassandra - A structured storage system on a P2P Network \|url=https://www.facebook.com/note.php?note_id=24413138919&id=9445547199&index=9 \|first=Avinash \|last=Lakshman ~~\|date=2008-08-25 \|accessdate=2014-06-17 \|publisher=Engineering @ Facebook's Notes}}~~ \|date=2008-08-25 \|access-date=2014-06-17 \|publisher=Engineering @ Facebook's Notes}} * {{cite web \|url=https://cassandra.apache.org/ \|title=The Apache Cassandra Project \|~~accessdate~~access-date=2014-06-17 \|publisher=[[Apache Software Foundation\|The Apache Software Foundation]] \|___location=Forest Hill, MD, USA}} * {{cite web \|url=https://wiki.apache.org/cassandra/ \|title=Project Wiki \|~~accessdate~~access-date=2014-06-17 \|publisher=[[Apache Software Foundation\|The Apache Software Foundation]] \|___location=Forest Hill, MD, USA \|archive-url=https://web.archive.org/web/20140614175405/http://wiki.apache.org/cassandra/ \|archive-date=2014-06-14 \|url-status=dead }} * {{cite web \|url=http://www.infoq.com/presentations/Adopting-Apache-Cassandra \|title=Adopting Apache Cassandra \|first=Eben \|last=Hewitt \|date=2010-12-01 \|~~accessdate~~access-date=2014-06-17 \|website=infoq.com \|publisher=InfoQ, C4Media Inc}} * {{cite web \|first1=Avinash \|last1=Lakshman \|first2=Prashant \|last2=Malik \|url=https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf \|title=Cassandra - A Decentralized Structured Storage System \|website=cs.cornell.edu \|date=2009-08-15 \|~~accessdate~~access-date=2014-06-17 \|others=The authors are from [[Facebook]]}} * {{cite web \|url=http://www.slideshare.net/jbellis/what-every-developer-should-know-about-database-scalability \|title=What Every Developer Should Know About Database Scalability \|first=Jonathan \|last=Ellis \|date=2009-07-29 \|~~accessdate~~access-date=2014-06-17 \|website=slideshare.net}} From the [[O'Reilly Open Source Convention\|OSCON]] 2009 talk on RDBMS vs. Dynamo, ~~BigTable~~Bigtable, and Cassandra. * {{cite web \|url=https://code.google.com/p/cassandra-rpm/ \|title=Cassandra-RPM - Red Hat Package Manager (RPM) build for the Apache Cassandra project \|website=code.google.com \|~~accessdate~~access-date=2014-06-17 \|publisher=[[Google Code#Project hosting\|Google Project Hosting]] \|___location=Menlo Park, CA, USA}} * {{cite web \|url=http://de.slideshare.net/grro/cassandra-by-example-the-path-of-read-and-write-requests \|title=Cassandra by example - the path of read and write requests \|first=Gregor \|last=Roth \|date=2012-10-14\|~~accessdate~~access-date=2014-06-17 \|website=slideshare.net}} * {{cite web \|url=http://~~www~~10kloc.~~networkworld~~wordpress.com/~~news~~category/~~tech~~cassandra-2/~~2012/102212-nosql-263595.html~~ \|title=A ~~vendor-independent comparison~~collection of ~~NoSQL databases:~~ Cassandra, ~~HBase, MongoDB, Riak~~tutorials \|first=~~Sergey~~Umer \|last=~~Bushik~~Mansoor \|date=2012-1011-2204 \|~~work=[[Network World\|NetworkWorld]] \|publisher=[[IDG]] \|___location=Framingham, MA, USA and Staines, Middlesex, UK \|accessdate~~access-date=~~2014~~2015-0602-1708}} * {{cite web\|url=http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html \|title=A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak \|first=Sergey \|last=Bushik \|date=2012-10-22 \|work=[[Network World\|NetworkWorld]] \|publisher=[[International Data Group\|IDG]] \|___location=Framingham, MA, USA and Staines, Middlesex, UK \|access-date=2014-06-17 \|url-status=dead \|archive-url=https://web.archive.org/web/20140528110238/http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html \|archive-date=2014-05-28 }} * {{cite web\|url=https://db-engines.com/en/system/Apache+Cassandra\|title=Apache Cassandra System Properties\|work=DB-Engines\|access-date=2025-05-28}} {{Apache Software Foundation}} ~~{{apache}}~~ {{Facebook navbox}} [[Category:2008 software]] ~~{{DEFAULTSORT:Cassandra (Database)}}~~ [[Category:Apache Software Foundation]] [[Category:Apache Software Foundation projects]] [[Category:~~BigTable~~Big ~~implementations~~data products]] [[Category:Bigtable implementations]] [[Category:Column-oriented DBMS software for Linux]] [[Category:Distributed data stores]] [[Category:Facebook software]] [[Category:Free database management systems]] ~~[[Category:Structured storage]]~~ [[Category:NoSQL]] [[Category:Structured storage]]