Trino (SQL query engine): Difference between revisions

Content deleted Content added
FrescoBot (talk | contribs)
m Bot: link syntax and minor changes
WikiCleanerBot (talk | contribs)
m v2.04b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
Line 16:
}}
 
'''Trino''' is an [[Open-source software|open-source]] distributed [[SQL]] query engine designed to query large data sets distributed over one or more heterogeneous data sources.<ref>{{cite web |title=Overview — Trino 361 Documentation |url=https://trino.io/docs/361/overview.html |website=trino.io |access-date=20 September 2021}}</ref>. Trino is commonly used as a query engine over [[Data_lake|datalakes]] and [[Data Warehouse|data warehouses]] using the [[Hive]] and [[List of Apache Software Foundation projects#Active projects|Iceberg]]<ref name="iceberg">{{cite web |title=About - Apache Iceberg |url=http://iceberg.apache.org/ |website=iceberg.apache.org |access-date=18 September 2021}}</ref> table formats. In these configurations Trino queries can query data in [[Free and open source software|open]] [[Column-oriented DBMS|column-oriented]] data file formats like [[Apache ORC|ORC]] or [[Apache Parquet|Parquet]] residing on different storage systems like [[Apache Hadoop#Hadoop distributed file system|HDFS]], [[Amazon S3|AWS S3]], [[Google Cloud Storage]], or [[Microsoft Azure#Storage services|Azure Blob Storage]]. Trino also has the ability to run federated queries across multiple disparate data sources such as [[MySQL]], [[PostgreSQL]], [[Apache Cassandra|Cassandra]], [[Apache Kafka|Kafka]], [[MongoDB]] and [[Elasticsearch]]. Trino is community driven and released under the [[Apache License]].
 
== History ==
Trino was originally designed and developed by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang at [[Facebook]] to allow data analysts to run interactive queries on its large [[data warehouse]] in [[Apache Hadoop]]. The project was originally named [[Presto (SQL query engine)|Presto]] and shares the first six years of development with the Presto project.<ref>{{cite web |title=Contributors to trinodb/trino |url=https://github.com/trinodb/trino/graphs/contributors?from=2012-08-05&to=2018-08-05&type=c |website=GitHub |access-date=20 September 2021 |language=en}}</ref><ref>{{cite web |title=Contributors to prestodb/presto |url=https://github.com/prestodb/presto/graphs/contributors?from=2012-08-05&to=2018-08-05&type=c |website=GitHub |access-date=20 September 2021 |language=en}}</ref>. Before Presto, data analysts at Facebook relied on [[Apache Hive]], which was too slow for running interctive SQL analytics on their 250 petabyte data warehouse.<ref name="2013facebook">{{Cite news|url=http://www.computerworld.com/article/2485668/business-intelligence/facebook-goes-open-source-with-query-engine-for-big-data.html|title=Facebook goes open source with query engine for big data|author=Joab Jackson|date=November 6, 2013|work=Computer World|access-date=April 26, 2017}}</ref>.
 
Martin, Dain, David, and Eric began development in 2012 and they deployed an initial version later that year. Later, Facebook announced its release as open source late Fall of 2013.<ref name="2013facebook" /><ref name="2013facebook2">{{Cite news|url=https://gigaom.com/2013/06/06/facebook-unveils-presto-engine-for-querying-250-pb-data-warehouse/|title=Facebook unveils Presto engine for querying 250 PB data warehouse|author=Jordan Novet|date=June 6, 2013|work=Giga Om|access-date=April 26, 2017}}</ref>. As Presto gained popularity, many well known companies, such as [[Netflix]],<ref>{{Cite news|url=http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html|title=Using Presto in our Big Data Platform on AWS|authors=Eva Tse, Zhenxiao Luo, Nezih Yigitbasi|date=October 7, 2014|work=Netflix technical blog|access-date=April 26, 2017}}</ref>, [[AirBnB]],<ref>{{cite web |title=Airpal: a Web UI for PrestoDB |url=https://medium.com/airbnb-engineering/airpal-a-web-based-query-execution-tool-for-data-analysis-33c43265ed1f |website=Medium |access-date=20 September 2021 |language=en |date=4 April 2016}}</ref>, among others, disclosed they used Presto in both on premise and cloud deployments at equivalent petabyte scales. In late 2016, Amazon released that it would provide Presto as a service called Athena.<ref>{{cite web |title=AWS Launches Amazon Athena {{!}} Amazon.com, Inc. - Press Room |url=https://press.aboutamazon.com/news-releases/news-release-details/aws-launches-amazon-athena |website=press.aboutamazon.com |access-date=20 September 2021 |language=en}}</ref>.
 
In late 2018, a disagreement around the stewardship of Presto between the founders and Facebook formed as Facebook management pushed to have tighter control over the project. This move included giving automatic committership rights to Facebook developers without prior experience with the project. Shortly after Facebook management moved forward with these changes, the creators left the original Presto project to create a fork.<ref name="2020rename">{{cite web |last1=Traverso |first1=Martin |last2=Sundstrom |first2=Dain |last3=Phillips |first3=David |title=We’re rebranding PrestoSQL as Trino |url=https://trino.io/blog/2020/12/27/announcing-trino.html |website=trino.io |access-date=7 September 2021 |language=en |date=27 December 2020}}</ref> This fork was also initially named Presto, so to differentiate them, users called the original project PrestoDB and the fork PrestoSQL named after their respective web addresses, https://prestodb.io and [https://trino.io https://prestosql.io]. It is worth noting that this split has striking similarities to the [[Jenkins (software)#History|Jenkins and Hudson split]].
Line 49:
== Use Cases ==
 
In general, Trino is used for [[OLAP]] scenarios instead of [[OLTP]] uses.<ref>{{cite web |title=Use cases — Trino 361 Documentation |url=https://trino.io/docs/361/overview/use-cases.html |website=trino.io |access-date=20 September 2021}}</ref>.
 
=== Data Lake Query Engine ===