Lambda architecture

This is an old revision of this page, as edited by Textractor (talk | contribs) at 00:42, 15 August 2014 (Info and citation on criticism). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Lambda Architecture

Lambda architecture refers to a data-processing architecture aimed at processing massive quantities of data while allowing ad-hoc queries and lowering the latency of those queries. Lambda architecture attempts to solve the problem of balancing comprehensiveness (including all data), accuracy, and latency when querying big-data collections.

Lambda architecture describes a system consisting of three layers:[1]

  • Batch – Precomputes results using a distributed processing system, typically Hadoop. This layer stores a master copy of the entire data set and acts as the system of record.
  • Serving – Responds to ad-hoc queries by gathering data from the batch layer, or, if unavailable, the Speed layer.
  • Speed – Processes data streams without regard to fix-ups or completeness.

Relies on a combination of computation techniques such as partial recomputation (p. 287) and estimation (hyperloglog), as well as optimizations in resource usage (p. 293) and data transformations.

Criticism of lambda architecture has focused on its inherent complexity. The batch and streaming sides each require a different code set that must be maintained and kept in sync so that processed data produces the same result from both paths.[2]

References

  1. ^ Marz, Nathan, and Warren, James. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications, 2013, p. 13.
  2. ^ Krebs, Jay. "Questioning the Lambda Architecure". radar.oreilly.com. Oreilly. Retrieved 15 August 2014.