Apache Pinot: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:44, 14 July 2021 edit Forlornacorn (talk \| contribs) 73 edits Updating project name references to be consistent with Apache Software Foundation. ← Previous edit		Latest revision as of 01:11, 15 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,872,275 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by CorrectionsJackal \| Category:Apache Software Foundation projects \| #UCB_Category 21/111
(30 intermediate revisions by 13 users not shown)
Line 1: {{Short description\|Open-source distributed data store}} {{Infobox software \| name = Apache Pinot \| logo = [[File:~~Pinot Logo~~Apache_Pinot_Crimson_logotype,_2023.~~svg~~png\|frameless\|alt=Pinot Logo]] \| screenshot = \| caption = \| author = {{ubl\|Kishore Gopalakrishna\|Xiang Fu}} \| developer = Apache Pinot \| latest release version = 01.72.10 \| latest release date = {{Start date and age\|df=yes\|~~2021~~2024\|0308\|1821}} \| repo = [https://~~gitbox~~github.com/apache~~.org~~/~~repos/asf?p=incubator-~~pinot~~.git~~ Pinot repository] \| programming language = [[Java (programming language)\|Java]] \| operating system = [[Cross-platform]] Line 20 ⟶ 21: }} '''Apache Pinot''' is a [[Column-oriented DBMS\|column-oriented]], [[open-source software\|open-source]], [[Distributed database\|distributed]] [[data store]] written in [[Java (programming language)\|Java]]. Pinot is designed to execute [[Online analytical processing\|OLAP]] queries with low latency.<ref>{{cite ~~journal~~book \|last1=Cui \|first1=Tingting \|last2=Peng \|first2=Lijun \|last3=Pardoe \|first3=David \|last4=Liu \|first4=Kun \|last5=Agarwal \|first5=Deepak \|last6=Kumar \|first6=Deepak \|title=Proceedings of the ADKDD'17 \|chapter=Data-Driven Reserve Prices for Social Advertising Auctions at LinkedIn ~~\|journal=Proceedings of the ADKDD'17~~ \|date=14 August 2017 \|pages=1–7 \|doi=10.1145/3124749.3124759 \|chapter-url=https://dl.acm.org/doi/abs/10.1145/3124749.3124759 \|publisher=Association for Computing Machinery\|isbn=9781450351942 \|s2cid=12327343 }}</ref><ref>{{cite book \|last1=Rosa \|first1=Marcello La \|title=ADVANCED INFORMATION SYSTEMS ENGINEERING: 33rd International Conference \|date=2021 \|publisher=Springer Nature \|isbn=978-3-030-79382-1 \|url=https://~~www~~books.google.com/books~~/edition/ADVANCED_INFORMATION_SYSTEMS_ENGINEERING/Q7k0EAAAQBAJ~~?hlid=~~en&gbpv=1~~Q7k0EAAAQBAJ&dq=Apache+Pinot+-wikipedia&pg=PA384&printsec=frontcover \|language=en}}</ref><ref>{{cite book \|last1=Koch \|first1=Chris \|title=Introduction to Information Technology \|date=14 November 2018 \|publisher=Scientific e-Resources \|isbn=978-1-83947-240-4 \|url=https://www.google.com/books/edition/Introduction_to_Information_Technology/9eLEDwAAQBAJ?hl=en&gbpv=1&dq=Pinot+(data+store)+-wikipedia&pg=PA130&printsec=frontcover \|language=en}}</ref><ref>{{cite book \|last1=Chin \|first1=Francis Y. L. \|last2=Chen \|first2=C. L. Philip \|last3=Khan \|first3=Latifur \|last4=Lee \|first4=Kisung \|last5=Zhang \|first5=Liang-Jie \|title=Big Data – BigData 2018: 7th International Congress, Held as Part of the Services Conference Federation, SCF 2018, Seattle, WA, USA, June 25–30, 2018, Proceedings \|date=20 June 2018 \|publisher=Springer \|isbn=978-3-319-94301-5 \|page=153 \|url=https://books.google.com/books?id=eSVhDwAAQBAJ \|language=en}}</ref><ref>{{cite book \|last1=Im \|first1=Jean-François \|last2=Gopalakrishna \|first2=Kishore \|last3=Subramaniam \|first3=Subbu \|last4=Shrivastava \|first4=Mayank \|last5=Tumbde \|first5=Adwait \|last6=Jiang \|first6=Xiaotian \|last7=Dai \|first7=Jennifer \|last8=Lee \|first8=Seunghyun \|last9=Pawar \|first9=Neha \|last10=Li \|first10=Jialiang \|last11=Aringunram \|first11=Ravi \|title=Proceedings of the 2018 International Conference on Management of Data \|chapter=Pinot: Realtime OLAP for 530 Million Users \|series=Sigmod '18 \|date=2018-05-27 \|pages=583–594 \|doi=10.1145/3183713.3190661 \|url=https://dl.acm.org/doi/10.1145/3183713.3190661#d13801648e1 \|publisher=Association for Computing Machinery\|isbn=9781450347037 \|s2cid=44083085 }}</ref><ref>{{cite web \|title=The Apache Software Foundation Announces Apache® Pinot™ as a Top-Level Project \|url=https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces76 \|website=blogs.apache.org\|date=2 August 2021 }}</ref> It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.<ref>{{cite ~~journal~~arXiv \|last1=Rogers \|first1=Ryan \|last2=Subramaniam \|first2=Subbu \|last3=Peng \|first3=Sean \|last4=Durfee \|first4=David \|last5=Lee \|first5=Seunghyun \|last6=Kancha \|first6=Santosh Kumar \|last7=Sahay \|first7=Shraddha \|last8=Ahammad \|first8=Parvez \|title=LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale ~~\|journal=arXiv:2002.05839 [cs]~~ \|date=16 November 2020 \|~~url~~class=~~https://arxiv~~cs.~~org/abs/~~CR \|eprint=2002.05839}}</ref><ref>{{cite ~~journal~~book \|last1=Javadi \|first1=Seyyed Ahmad \|last2=Gupta \|first2=Harsh \|last3=Manhas \|first3=Robin \|last4=Sahu \|first4=Shweta \|last5=Gandhi \|first5=Anshul \|title=2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) \|chapter=EASY: Efficient Segment Assignment Strategy for Reducing Tail Latencies in Pinot ~~\|journal=2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)~~ \|date=July 2018 \|pages=1432–1437 \|doi=10.1109/ICDCS.2018.00144 \|~~url~~isbn=~~https://ieeexplore.ieee.org/abstract/document/8416407~~978-1-5386-6871-9 \|s2cid=21659844 }}</ref><ref name="pinot-joins-apache-foundation">Pawar, Neha. [https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator "Pinot Joins Apache Incubator"] {{Webarchive\|url=https://web.archive.org/web/20190402090136/https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator \|date=2019-04-02 }}, ''LinkedIn Engineering'', 01 April 2019</ref> The name Pinot comes from the [[Pinot grape]] vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.<ref name="open-sourcing-pinot">{{cite web \|last1=Gopalakrishna \|first1=Kishore \|title=Open Sourcing Pinot: Scaling the Wall of Real-Time Analytics \|url=https://engineering.linkedin.com/pinot/open-sourcing-pinot-scaling-wall-real-time-analytics \|website=engineering.linkedin.com \|publisher=LinkedIn \|accessdate=3 September 2020 \|archiveurl=https://web.archive.org/web/20150910081445/http://engineering.linkedin.com/pinot/open-sourcing-pinot-scaling-wall-real-time-analytics \|archivedate=10 September 2015 \|language=en}}</ref> Pinot was first created at [[LinkedIn]] after the engineering staff determined that there were no off the shelf solutions that met the social networking site's requirements like predictable low latency, data freshness in seconds, fault tolerance and scalability.<ref name="open-sourcing-pinot" /><ref>{{cite news \|last1=Yegulalp \|first1=Serdar \|title=LinkedIn fills another SQL-on-Hadoop niche \|url=https://www.infoworld.com/article/2934506/linkedins-pinot-fills-another-sql-on-hadoop-niche.html \|work=InfoWorld \|date=2015-06-11 \|language=en}}</ref> Pinot is used in production by technology companies such as [[Uber]],<ref>{{cite ~~journal~~book \|last1=Fu \|first1=Yupeng \|last2=Soman \|first2=Chinmay \|title~~=Real-time Data Infrastructure at Uber \|journal~~=Proceedings of the 2021 International Conference on Management of Data \|chapter=Real-time Data Infrastructure at Uber \|series=Sigmod/Pods '21 \|date=9 June 2021 \|pages=2503–2516 \|doi=10.1145/3448016.3457552 \|chapter-url=https://dl.acm.org/doi/abs/10.1145/3448016.3457552 \|publisher=Association for Computing Machinery\|arxiv=2104.00087 \|isbn=9781450383431 \|s2cid=232478317 }}</ref> [[Microsoft]],<ref name="pinot-joins-apache-foundation" /> and [[Factual]]. == History == Line 30 ⟶ 31: [[File:Pinot Architecture.png\|520x520px\|thumb\|alt=Architecture of Apache Pinot\|Architecture diagram of Apache Pinot]] Pinot uses [[Apache Helix]] for cluster management. Helix is embedded as an agent within the different components and uses [[Apache ZooKeeper]] for coordination and maintaining the overall cluster state and health. All Pinot servers and brokers are managed by Helix. Helix is a generic cluster management framework to manage partitions and replicas in a distributed system. ~~<br>~~ === Query management === Line 39: == Features == Pinot shares similar features with comparable OLAP datastores, such as [[Apache Druid]].<ref>{{cite book \|last1=Ordonez \|first1=Carlos \|last2=Song \|first2=Il-Yeol \|last3=Anderst-Kotsis \|first3=Gabriele \|last4=Tjoa \|first4=A. Min \|last5=Khalil \|first5=Ismail \|title=Big Data Analytics and Knowledge Discovery: 21st International Conference, DaWaK 2019, Linz, Austria, August 26–29, 2019, Proceedings \|date=2 October 2019 \|publisher=Springer \|isbn=978-3-030-27520-4 \|page=170 \|url=https://~~www~~books.google.com/books~~/edition/Big_Data_Analytics_and_Knowledge_Discove/~~?id=sf-pDwAAQBAJ~~?hl=en&gbpv=1~~&dq=Pinot+(data+store)+-wikipedia&pg=PA170~~&printsec=frontcover~~ \|language=en}}</ref><ref>{{cite book \|last1=Uttamchandani \|first1=Sandeep \|title=The Self-Service Data Roadmap \|date=10 September 2020 \|publisher="O'Reilly Media, Inc." \|isbn=978-1-4920-7520-2 \|url=https://~~www~~books.google.com/books~~/edition/The_Self_Service_Data_Roadmap/pEn8DwAAQBAJ~~?hlid=~~en&gbpv=1~~pEn8DwAAQBAJ&dq=Pinot+(data+store)+-wikipedia&pg=PT72~~&printsec=frontcover~~ \|language=en}}</ref>. Like Druid, Pinot is a column-oriented database with various compression schemes such as [[~~Run~~run-length encoding\|Run Length]] and [[~~Variable~~variable-length ~~encoding~~code\|Fixed -Bit Length]]. Pinot supports pluggable [[Database index\|indexing technologies]] - Sorted Index, [[Bitmap Index]], [[Inverted index\|Inverted Index]], Star-Tree Index, and Range Index, which are what primarily differentiates Pinot from other OLAP datastores. Pinot supports near real-time ingestion from streams such as [[Apache Kafka\|Kafka]], [[AWS]] Kinesis and [[Batch processing\|batch]] ingestion from sources such as [[Hadoop]], [[Amazon S3\|S3]], [[Microsoft Azure\|Azure]], [[Google Cloud Storage\|GCS]]. Like ~~mostly, all~~most other [[Online analytical processing\|OLAP]] datastores and [[data warehousing]] solutions, Pinot supports a [[SQL]]-like query language that supports selection, aggregation, filtering, group by, order by, distinct queries on data. == See also == {{Portal\|Free and open-source software}} * [[List of column-oriented DBMSes]] * [[Comparison of OLAP servers]] == References == {{Reflist\|30em}} == External links == Line 58 ⟶ 60: [[Category:Structured storage]] [[Category:Free database management systems]] [[Category:Free software programmed in Java (programming language)]] [[Category:Database engines]] [[Category:Big data products]]