Lambda architecture

This is an old revision of this page, as edited by Textractor (talk | contribs) at 00:52, 15 August 2014. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Lambda Architecture

Lambda architecture refers to a data-processing architecture aimed at processing massive quantities of data while allowing ad-hoc queries and lowering the latency of those queries. Lambda architecture attempts to solve the problem of balancing comprehensiveness (including all data), accuracy, and latency when querying big-data collections.

Lambda architecture describes a system consisting of three layers:[1]

  • Batch – Precomputes results using a distributed processing system, typically Hadoop. This layer stores a master copy of the entire data set and acts as the system of record.
  • Serving – Responds to ad-hoc queries by gathering data from the batch layer, or, if unavailable, the Speed layer.
  • Speed – Processes data streams without regard to fix-ups or completeness.

Relies on a combination of computation techniques such as partial recomputation (p. 287) and estimation (hyperloglog), as well as optimizations in resource usage (p. 293) and data transformations.

Criticism of lambda architecture has focused on its inherent complexity and its limiting influence. The batch and streaming sides each require a different code base that must be maintained and kept in sync so that processed data produces the same result from both paths, while attempting to abstract the code bases into a single framework puts many of the specialized tools in each sides ecosystems out of reach.[2]

References

  1. ^ Marz, Nathan, and Warren, James. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications, 2013, p. 13.
  2. ^ Krebs, Jay. "Questioning the Lambda Architecure". radar.oreilly.com. Oreilly. Retrieved 15 August 2014.