Data is ingested by Druid directly through its real-time nodes, or batch-loaded into historical nodes from a deep storage facility. Real-time nodes accept [[JSON]]-formatted data from a streaming [[datasource]]. Batch-loaded data formats includecan be JSON, CSV, andor TSV. Real-time nodes temporarily store and serve data in real time, but eventually push the data to the deep storage facility, from which it is loaded into historical nodes. Historical nodes hold the bulk of data in the cluster.
Real-time nodes chunk data into segments, and are designed to frequently move these segments out to deep storage. To maintain cluster awareness of the ___location of data, these nodes must interact with [[Mysql]] to update metadata about the segments, and with [[Apache ZooKeeper]] to monitor their transfer.
=== Query Management ===
Client queries arefirst forwardedhit broker nodes, which forward them to the appropriate data nodes (either historical or real-time). fromSince brokerDruid nodessegments may be partitioned, whichan alsoincoming returnquery thecan resultsrequire todata from multiple segments and partitions (or [[Sharding|shards]]) stored on different nodes in the clientscluster. Brokers are able to learn which nodes holdhave the targetrequired data, and also merge partial results before finally responding to a query withreturning the aggregated result.
=== Cluster Management ===
DruidOperations coordinatorrelating nodesto overseedata certainmanagement aspectsin ofhistorical Druidnodes clusterare operationsoverseen by coordinator nodes, especiallywhich relatingare tothe dataprime managementusers inof historicalthe nodes[[Mysql]] metadata tables. [[Apache ZooKeeper]] is used to register all nodes, andmanage thecertain ___locationaspects of datainternode communications, and provide for leader elections.