Trino (SQL query engine): Difference between revisions

Content deleted Content added
m error 64 in CWP + clean up
Line 15:
}}
 
'''Trino''' is an [[Open-source software|open-source]] distributed [[SQL]] query engine designed to query large data sets distributed over one or more heterogeneous data sources.<ref>{{cite web |title=Overview — Trino 393 Documentation |url=https://trino.io/docs/393/overview.html |website=trino.io |access-date=25 August 2022}}</ref> Trino can query [[Data_lakeData lake|datalakes]] that contain [[Free and open source software|open]] [[Column-oriented DBMS|column-oriented]] data file formats like [[Apache ORC|ORC]] or [[Apache Parquet|Parquet]]<ref name="hive-connector" /><ref name="iceberg-connector" /> residing on different storage systems like [[Apache Hadoop#Hadoop distributed file system|HDFS]], [[Amazon S3|AWS S3]], [[Google_Cloud_Storage|Google Cloud Storage]], or [[Microsoft Azure#Storage services|Azure Blob Storage]]<ref name="trino-definitive-guide-ch1" /> using the [[Hive]]<ref name="hive-connector">{{cite web |title=Hive connector — Trino 393 Documentation |url=https://trino.io/docs/393/connector/hive.html |website=trino.io}}</ref> and [[List of Apache Software Foundation projects#Active projects|Iceberg]]<ref name="iceberg-connector">{{cite web |title=Iceberg connector — Trino 393 Documentation |url=https://trino.io/docs/393/connector/iceberg.html |website=trino.io |access-date=25 August 2022}}</ref> table formats. Trino also has the ability to run federated queries that query tables in different data sources such as [[MySQL]], [[PostgreSQL]], [[Apache Cassandra|Cassandra]], [[Apache Kafka|Kafka]], [[MongoDB]] and [[Elasticsearch]].<ref>{{cite web |title=Connectors — Trino 393 Documentation |url=https://trino.io/docs/393/connector.html |website=trino.io |access-date=25 August 2022}}</ref> Trino is released under the [[Apache License]].<ref>{{cite web |title=trinodb/trino LICENSE |url=https://github.com/trinodb/trino/blob/master/LICENSE |publisher=Trino |access-date=25 August 2022 |date=25 August 2022}}</ref>
 
 
== History ==
 
In January 2019, the original creators of [[Presto (SQL query engine)|Presto]], Martin Traverso, Dain Sundstrom, and David Phillips, created a [[Fork_Fork (software_developmentsoftware development)|fork]] of the Presto project. They initially kept the name Presto and used the PrestoSQL web handle to distinguish it from the original PrestoDB project. Simultaneously, they announced the Presto Software Foundation. The foundation is a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL query engine.<ref name="2019psf">{{Cite web|url=https://www.prweb.com/releases/presto_software_foundation_launches_to_advance_presto_open_source_community/prweb16070792.htm|title=Presto Software Foundation Launches to Advance Presto Open Source Community|website=PRWeb|access-date=2019-02-01}}</ref><ref name="2019psf2">{{Cite web|url=https://thenewstack.io/prestos-new-foundation-signals-growth-for-the-big-data-sql-engine/|title=Presto's New Foundation Signals Growth for the Big Data SQL Engine|date=2019-01-31|website=The New Stack|language=en-US|access-date=2019-02-01}}</ref>
 
In December 2020, PrestoSQL was rebranded as Trino. The Trino Software Foundation, code base, and all other PrestoSQL assets were renamed as part of the rebrand.<ref name="2020rename">{{cite web |last1=Traverso |first1=Martin |last2=Sundstrom |first2=Dain |last3=Phillips |first3=David |title=We're rebranding PrestoSQL as Trino |url=https://trino.io/blog/2020/12/27/announcing-trino.html |website=trino.io |access-date=7 September 2021 |language=en |date=27 December 2020}}</ref>
Line 31 ⟶ 29:
[[File:Figure 4-1 Trino architecture.png|thumb|Trino architecture overview with coordinator and workers<ref name="trino-definitive-guide-ch4">{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 4. Trino Architecture |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=43–72}}</ref>]]
 
Trino is written in [[Java (programming language)|Java]].<ref name="trino-definitive-guide-ch2">{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 2. Installing and Configuring Trino |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=19–24}}</ref> It runs on a cluster of servers that contains two types of nodes, a '''coordinator''' and a '''worker'''.<ref name="trino-definitive-guide-ch4" />
 
* The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the [[service provider interface]] (SPI) to obtain the available tables, table statistics, and other information needed to carry out its tasks.<ref name="trino-definitive-guide-ch4" />
 
* The workers are responsible for executing the tasks and operators fed to them by the scheduler. These tasks process rows from the data sources which produce results that are returned to the coordinator and ultimately back to the client.<ref name="trino-definitive-guide-ch4" />
 
Line 51 ⟶ 48:
 
== References ==
{{Reflist}}<br/>
 
== External links ==