Content deleted Content added
rmv non-WP:RS : publisher of copied Wikipedia articles, see WP:CIRCULAR (ed-tech press scam) |
m clean up, citation bot to follow |
||
Line 21:
}}
'''Apache Pinot''' is a [[Column-oriented DBMS|column-oriented]], [[open-source software|open-source]], [[Distributed database|distributed]] [[data store]] written in [[Java (programming language)|Java]]. Pinot is designed to execute OLAP queries with low latency.<ref>{{cite journal |last1=Cui |first1=Tingting |last2=Peng |first2=Lijun |last3=Pardoe |first3=David |last4=Liu |first4=Kun |last5=Agarwal |first5=Deepak |last6=Kumar |first6=Deepak |title=Data-Driven Reserve Prices for Social Advertising Auctions at LinkedIn |journal=Proceedings of the ADKDD'17 |series=Adkdd'17 |date=14 August 2017 |pages=1–7 |doi=10.1145/3124749.3124759 |url=https://dl.acm.org/doi/abs/10.1145/3124749.3124759 |publisher=Association for Computing Machinery|isbn=9781450351942 |s2cid=12327343 }}</ref><ref>{{cite book |last1=Rosa |first1=Marcello La |title=ADVANCED INFORMATION SYSTEMS ENGINEERING: 33rd International Conference |date=2021 |publisher=Springer Nature |isbn=978-3-030-79382-1 |url=https://www.google.com/books/edition/ADVANCED_INFORMATION_SYSTEMS_ENGINEERING/Q7k0EAAAQBAJ?hl=en&gbpv=1&dq=Apache+Pinot+-wikipedia&pg=PA384&printsec=frontcover |language=en}}</ref><ref>{{cite book |last1=Chin |first1=Francis Y. L. |last2=Chen |first2=C. L. Philip |last3=Khan |first3=Latifur |last4=Lee |first4=Kisung |last5=Zhang |first5=Liang-Jie |title=Big Data – BigData 2018: 7th International Congress, Held as Part of the Services Conference Federation, SCF 2018, Seattle, WA, USA, June 25–30, 2018, Proceedings |date=20 June 2018 |publisher=Springer |isbn=978-3-319-94301-5 |page=153 |url=https://books.google.com/books?id=eSVhDwAAQBAJ |language=en}}</ref><ref>{{cite book |last1=Im |first1=Jean-François |last2=Gopalakrishna |first2=Kishore |last3=Subramaniam |first3=Subbu |last4=Shrivastava |first4=Mayank |last5=Tumbde |first5=Adwait |last6=Jiang |first6=Xiaotian |last7=Dai |first7=Jennifer |last8=Lee |first8=Seunghyun |last9=Pawar |first9=Neha |last10=Li |first10=Jialiang |last11=Aringunram |first11=Ravi |title=Pinot: Realtime OLAP for 530 Million Users |series=Sigmod '18 |date=2018-05-27 |pages=583–594 |doi=10.1145/3183713.3190661 |url=https://dl.acm.org/doi/10.1145/3183713.3190661#d13801648e1 |publisher=Association for Computing Machinery|isbn=9781450347037 |s2cid=44083085 }}</ref> It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.<ref>{{cite
Pinot was first created at [[LinkedIn]] after the engineering staff determined that there were no off the shelf solutions that met the social networking site's requirements like predictable low latency, data freshness in seconds, fault tolerance and scalability.<ref name="open-sourcing-pinot" /><ref>{{cite news |last1=Yegulalp |first1=Serdar |title=LinkedIn fills another SQL-on-Hadoop niche |url=https://www.infoworld.com/article/2934506/linkedins-pinot-fills-another-sql-on-hadoop-niche.html |work=InfoWorld |date=2015-06-11 |language=en}}</ref> Pinot is used in production by technology companies such as [[Uber]],<ref>{{cite journal |last1=Fu |first1=Yupeng |last2=Soman |first2=Chinmay |title=Real-time Data Infrastructure at Uber |journal=Proceedings of the 2021 International Conference on Management of Data |series=Sigmod/Pods '21 |date=9 June 2021 |pages=2503–2516 |doi=10.1145/3448016.3457552 |url=https://dl.acm.org/doi/abs/10.1145/3448016.3457552 |publisher=Association for Computing Machinery|arxiv=2104.00087 |isbn=9781450383431 |s2cid=232478317 }}</ref> [[Microsoft]],<ref name="pinot-joins-apache-foundation" /> and [[Factual]].
Line 39:
== Features ==
Pinot shares similar features with comparable OLAP datastores, such as [[Apache Druid]].<ref>{{cite book |last1=Ordonez |first1=Carlos |last2=Song |first2=Il-Yeol |last3=Anderst-Kotsis |first3=Gabriele |last4=Tjoa |first4=A. Min |last5=Khalil |first5=Ismail |title=Big Data Analytics and Knowledge Discovery: 21st International Conference, DaWaK 2019, Linz, Austria, August 26–29, 2019, Proceedings |date=2 October 2019 |publisher=Springer |isbn=978-3-030-27520-4 |page=170 |url=https://www.google.com/books/edition/Big_Data_Analytics_and_Knowledge_Discove/sf-pDwAAQBAJ?hl=en&gbpv=1&dq=Pinot+(data+store)+-wikipedia&pg=PA170&printsec=frontcover |language=en}}</ref><ref>{{cite book |last1=Uttamchandani |first1=Sandeep |title=The Self-Service Data Roadmap |date=10 September 2020 |publisher="O'Reilly Media, Inc." |isbn=978-1-4920-7520-2 |url=https://www.google.com/books/edition/The_Self_Service_Data_Roadmap/pEn8DwAAQBAJ?hl=en&gbpv=1&dq=Pinot+(data+store)+-wikipedia&pg=PT72&printsec=frontcover |language=en}}</ref> Like Druid, Pinot is a column-oriented database with various compression schemes such as [[Run-length encoding|Run Length]] and [[Variable-length encoding|Fixed Bit Length]]. Pinot supports pluggable [[Database index|indexing technologies]] - Sorted Index, [[Bitmap Index]], [[Inverted index|Inverted Index]], Star-Tree Index, and Range Index, which are what primarily differentiates Pinot from other OLAP datastores.
Pinot supports near real-time ingestion from streams such as [[Apache Kafka|Kafka]], [[AWS]] Kinesis and [[Batch processing|batch]] ingestion from sources such as [[Hadoop]], [[Amazon S3|S3]], [[Microsoft Azure|Azure]], [[Google Cloud Storage|GCS]]. Like mostly, all other [[Online analytical processing|OLAP]] datastores and [[data warehousing]] solutions, Pinot supports a [[SQL]]-like query language that supports selection, aggregation, filtering, group by, order by, distinct queries on data.
|