Trino (SQL query engine): Difference between revisions

Content deleted Content added
Replace specific Athena reference with overview of different use cases and link to users page on Trino site
Tag: references removed
m showing breadth of file formats, not just columnar ones
 
(2 intermediate revisions by 2 users not shown)
Line 16:
}}
 
'''Trino''' is an [[Open-source software|open-source]] distributed [[SQL]] query engine designed to query large data sets distributed over one or more heterogeneous data sources.<ref>{{cite web |title=Overview — Trino 393468 Documentation |url=https://trino.io/docs/393468/overview.html |website=trino.io |access-date=2527 AugustDecember 20222024}}</ref> Trino can query [[data lake]]s that contain a variety of file formats such as simple row-oriented CSV and JSON data files to more performant [[Free and open source software|open]] [[Column-oriented DBMS|column-oriented]] data file formats like [[Apache ORC|ORC]] or [[Apache Parquet|Parquet]]<ref name="hive-connector" /><ref name="iceberg-connector" /> residing on different storage systems like [[Apache Hadoop#Hadoop distributed file system|HDFS]], [[Amazon S3|AWS S3]], [[Google Cloud Storage]], or [[Microsoft Azure#Storage services|Azure Blob Storage]]<ref name="trino-definitive-guide-ch1" /> using the [[Apache Hive|Hive]]<ref name="hive-connector">{{cite web |title=Hive connector — Trino 393 Documentation |url=https://trino.io/docs/393/connector/hive.html |website=trino.io}}</ref> and [[List of Apache Software Foundation projects#Active projects|Iceberg]]<ref name="iceberg-connector">{{cite web |title=Iceberg connector — Trino 393 Documentation |url=https://trino.io/docs/393/connector/iceberg.html |website=trino.io |access-date=25 August 2022}}</ref> table formats. Trino also has the ability to run federated queries that query tables in different data sources such as [[MySQL]], [[PostgreSQL]], [[Apache Cassandra|Cassandra]], [[Apache Kafka|Kafka]], [[MongoDB]] and [[Elasticsearch]].<ref>{{cite web |title=Connectors — Trino 393 Documentation |url=https://trino.io/docs/393/connector.html |website=trino.io |access-date=25 August 2022}}</ref> Trino is released under the [[Apache License]].<ref>{{cite web |title=trinodb/trino LICENSE |url=https://github.com/trinodb/trino/blob/master/LICENSE |publisher=Trino |access-date=25 August 2022 |date=25 August 2022}}</ref>
 
== History ==
 
In January 2019, the original creators of [[Presto (SQL query engine)|Presto]], Martin Traverso, Dain Sundstrom, and David Phillips, created a [[Fork (software development)|fork]] of the Presto project. They initially kept the name Presto and used the PrestoSQL web handle to distinguish it from the original PrestoDB project. Simultaneously, they announced the Presto Software Foundation. The foundation is a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL query engine.<ref name="2019psf">{{Cite web|url=https://www.prweb.com/releases/presto_software_foundation_launches_to_advance_presto_open_source_community/prweb16070792presto-software-foundation-launches-to-advance-presto-open-source-community-815915772.htmhtml|title=Presto Software Foundation Launches to Advance Presto Open Source Community|website=PRWeb|access-date=2019-02-01}}</ref><ref name="2019psf2">{{Cite web|url=https://thenewstack.io/prestos-new-foundation-signals-growth-for-the-big-data-sql-engine/|title=Presto's New Foundation Signals Growth for the Big Data SQL Engine|date=2019-01-31|website=The New Stack|language=en-US|access-date=2019-02-01}}</ref>
 
In December 2020, PrestoSQL was rebranded as Trino. The Trino Software Foundation, code base, and all other PrestoSQL assets were renamed as part of the rebrand.<ref name="2020rename">{{cite web |last1=Traverso |first1=Martin |last2=Sundstrom |first2=Dain |last3=Phillips |first3=David |title=We're rebranding PrestoSQL as Trino |url=https://trino.io/blog/2020/12/27/announcing-trino.html |website=trino.io |access-date=7 September 2021 |language=en |date=27 December 2020}}</ref>
Line 37:
* The workers are responsible for executing the tasks and operators fed to them by the scheduler. These tasks process rows from the data sources which produce results that are returned to the coordinator and ultimately back to the client.<ref name="trino-definitive-guide-ch4" />
 
Trino adheres to the [[ANSI]] [[SQL]]<ref name="trino-definitive-guide-ch1">{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 1. Introducing Trino |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=3–17}}</ref> standard and includes various parts of the following ANSI specifications: [[SQL-92]], [[SQL:1999]], [[SQL:2003]], [[SQL:2008]], [[SQL:2011]], [[SQL:2016]], [[SQL:2023]].
 
Trino supports the separation of compute and storage<ref name="trino-definitive-guide-ch1" /> and may be deployed both on-premises and in the [[Cloud computing|cloud]].<ref name="trino-definitive-guide-ch13">{{cite book |last1=Fuller |first1=Matt |last2=Moser |first2=Manfred |last3=Traverso |first3=Martin |title=Trino: The Definitive Guide |chapter=Chapter 13. Real-World Examples |date=2021 |publisher=O'Reilly Media, Inc, USA |isbn=9781098107710 |pages=267–272}}</ref>