Revision as of 19:00, 9 April 2014 edit Textractor (talk \| contribs) 110 edits added image ← Previous edit		Revision as of 23:51, 9 April 2014 edit undo Textractor (talk \| contribs) 110 edits expanded subsections in architecture section Next edit →
Line 27: === Data Ingestion === Data is ingested by Druid directly through its real-time nodes, or batch-loaded into historical nodes from a deep storage facility. Real-time nodes accept [[JSON]]-formatted data from a streaming [[datasource]]. Batch-loaded data formats ~~include~~can be JSON, CSV, ~~and~~or TSV. Real-time nodes temporarily store and serve data in real time, but eventually push the data to the deep storage facility, from which it is loaded into historical nodes. Historical nodes hold the bulk of data in the cluster. Real-time nodes chunk data into segments, and are designed to frequently move these segments out to deep storage. To maintain cluster awareness of the ___location of data, these nodes must interact with [[Mysql]] to update metadata about the segments, and with [[Apache ZooKeeper]] to monitor their transfer. === Query Management === Client queries ~~are~~first ~~forwarded~~hit broker nodes, which forward them to the appropriate data nodes (either historical or real-time). ~~from~~Since ~~broker~~Druid ~~nodes~~segments may be partitioned, ~~which~~an ~~also~~incoming ~~return~~query ~~the~~can ~~results~~require todata from multiple segments and partitions (or [[Sharding\|shards]]) stored on different nodes in the ~~clients~~cluster. Brokers are able to learn which nodes ~~hold~~have the ~~target~~required data, and also merge partial results before ~~finally responding to a query with~~returning the aggregated result. === Cluster Management === ~~Druid~~Operations ~~coordinator~~relating ~~nodes~~to ~~oversee~~data ~~certain~~management ~~aspects~~in ofhistorical ~~Druid~~nodes ~~cluster~~are ~~operations~~overseen by coordinator nodes, ~~especially~~which ~~relating~~are tothe ~~data~~prime ~~management~~users inof ~~historical~~the ~~nodes~~[[Mysql]] metadata tables. [[Apache ZooKeeper]] is used to register all nodes, ~~and~~manage ~~the~~certain ~~___location~~aspects of ~~data~~internode communications, and provide for leader elections. == History ==

Apache Druid: Difference between revisions