Trino (SQL query engine): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:08, 3 October 2022 edit Smga3000 (talk \| contribs) 150 edits Creating a proper Trino page as it has diverged from the original Presto enough, and is notable enough that having its own page is helpful Tags: Removed redirect Disambiguation links added ← Previous edit		Latest revision as of 17:24, 27 December 2024 edit undo LesterMartin (talk \| contribs) 2 edits m showing breadth of file formats, not just columnar ones
(19 intermediate revisions by 16 users not shown)
Line 1: {{Short description\|Open-source distributed SQL query engine}} {{Infobox software \| name = Trino Line 15 ⟶ 16: }} '''Trino''' is an [[Open-source software\|open-source]] distributed [[SQL]] query engine designed to query large data sets distributed over one or more heterogeneous data sources.<ref>{{cite web \|title=Overview — Trino ~~393~~468 Documentation \|url=https://trino.io/docs/~~393~~468/overview.html \|website=trino.io \|access-date=2527 ~~August~~December ~~2022~~2024}}</ref>. Trino can query [[~~Data_lake\|datalakes~~data lake]]s that contain a variety of file formats such as simple row-oriented CSV and JSON data files to more performant [[Free and open source software\|open]] [[Column-oriented DBMS\|column-oriented]] data file formats like [[Apache ORC\|ORC]] or [[Apache Parquet\|Parquet]]<ref name="hive-connector" /><ref name="iceberg-connector" /> residing on different storage systems like [[Apache Hadoop#Hadoop distributed file system\|HDFS]], [[Amazon S3\|AWS S3]], [[~~Google_Cloud_Storage\|~~Google Cloud Storage]], or [[Microsoft Azure#Storage services\|Azure Blob Storage]]<ref name="trino-definitive-guide-ch1" /> using the [[Apache Hive\|Hive]]<ref name="hive-connector">{{cite web \|title=Hive connector — Trino 393 Documentation \|url=https://trino.io/docs/393/connector/hive.html \|website=trino.io}}</ref> and [[List of Apache Software Foundation projects#Active projects\|Iceberg]]<ref name="iceberg-connector">{{cite web \|title=Iceberg connector — Trino 393 Documentation \|url=https://trino.io/docs/393/connector/iceberg.html \|website=trino.io \|access-date=25 August 2022}}</ref> table formats. Trino also has the ability to run federated queries that query tables in different data sources such as [[MySQL]], [[PostgreSQL]], [[Apache Cassandra\|Cassandra]], [[Apache Kafka\|Kafka]], [[MongoDB]] and [[Elasticsearch]].<ref>{{cite web \|title=Connectors — Trino 393 Documentation \|url=https://trino.io/docs/393/connector.html \|website=trino.io \|access-date=25 August 2022}}</ref>. Trino is released under the [[Apache License]].<ref>{{cite web \|title=trinodb/trino LICENSE \|url=https://github.com/trinodb/trino/blob/master/LICENSE \|publisher=Trino \|access-date=25 August 2022 \|date=25 August 2022}}</ref>. == History == In January 2019, the original creators of [[Presto (SQL query engine)\|Presto]], Martin Traverso, Dain Sundstrom, and David Phillips, created a [[~~Fork_~~Fork (~~software_development~~software development)\|fork]] ~~off~~ of the Presto project. They initially kept the name Presto and used the PrestoSQL web handle to distinguish it from the original PrestoDB project. Simultaneously, they announced the Presto Software Foundation. The foundation is a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL query engine.<ref name="2019psf">{{Cite web\|url=https://www.prweb.com/releases/~~presto_software_foundation_launches_to_advance_presto_open_source_community/prweb16070792~~presto-software-foundation-launches-to-advance-presto-open-source-community-815915772.~~htm~~html\|title=Presto Software Foundation Launches to Advance Presto Open Source Community\|website=PRWeb\|access-date=2019-02-01}}</ref><ref name="2019psf2">{{Cite web\|url=https://thenewstack.io/prestos-new-foundation-signals-growth-for-the-big-data-sql-engine/\|title=Presto's New Foundation Signals Growth for the Big Data SQL Engine\|date=2019-01-31\|website=The New Stack\|language=en-US\|access-date=2019-02-01}}</ref>. In December 2020, PrestoSQL was rebranded as Trino. The Trino Software Foundation, code base, and all other PrestoSQL assets were renamed as part of the rebrand.<ref name="2020rename">{{cite web \|last1=Traverso \|first1=Martin \|last2=Sundstrom \|first2=Dain \|last3=Phillips \|first3=David \|title=~~We’re~~We're rebranding PrestoSQL as Trino \|url=https://trino.io/blog/2020/12/27/announcing-trino.html \|website=trino.io \|access-date=7 September 2021 \|language=en \|date=27 December 2020}}</ref>. Presto and Trino were originally designed and developed by Martin, Dain, David, and Eric Hwang at [[Facebook]] to allow data analysts to run interactive queries on its large [[data warehouse]] in [[Apache Hadoop]]. Trino shares the first six years of development with the Presto project.<ref>{{cite web \|title=Contributors to trinodb/trino \|url=https://github.com/trinodb/trino/graphs/contributors?from=2012-08-05&to=2018-08-05&type=c \|website=GitHub \|access-date=20 September 2021 \|language=en}}</ref><ref>{{cite web \|title=Contributors to prestodb/presto \|url=https://github.com/prestodb/presto/graphs/contributors?from=2012-08-05&to=2018-08-05&type=c \|website=GitHub \|access-date=20 September 2021 \|language=en}}</ref>. To learn more about the earlier history of Trino, you can reference [[Presto (SQL query engine)#History\|the Presto history section]]. Trino is used in many data platforms and products from cloud providers and other vendors. Customization of these products varies from pure Trino usage to heavily customized systems to run a data platform or integration in specialized data platforms for usage with specific data. [https://trino.io/users Examples include Amazon Athena, Starburst Galaxy, Dune, and many others.] == Architecture == [[File:Figure 4-1 Trino architecture.png\|thumb\|Trino architecture overview with coordinator and workers<ref name="trino-definitive-guide-ch4">{{cite book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 4. Trino Architecture \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=43–72}}</ref>]] Trino is written in [[Java (programming language)\|Java]]<ref name="trino-definitive-guide-ch2">{{cite book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 2. Installing and Configuring Trino \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=19-24}}</ref>. It contains two types of nodes, a '''coordinator''' and a '''worker'''<ref name="trino-definitive-guide-ch4">{{cite book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 4. Trino Architecture \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=43-72}}</ref>. ▼ ▲Trino is written in [[Java (programming language)\|Java]].<ref name="trino-definitive-guide-ch2">{{cite book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 2. Installing and Configuring Trino \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=~~19-24~~19–24}}</ref>. It runs on a cluster of servers that contains two types of nodes, a '''coordinator''' and a '''worker'''.<ref name="trino-definitive-guide-ch4"~~>{{cite~~ book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 4. Trino Architecture \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=43-72}}</~~ref~~>. * The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the [[service provider interface]](SPI) to obtain the available tables, table statistics, and other information needed to carry out its tasks<ref name="trino-definitive-guide-ch4" />. * The ~~workers~~coordinator ~~are~~is responsible for ~~executing~~parsing, ~~the~~analyzing, ~~tasks~~optimizing, planning, and ~~operators~~scheduling ~~fed~~a toquery ~~them~~submitted by ~~the~~a ~~scheduler~~client. ~~These~~The ~~tasks~~coordinator ~~process~~interacts ~~rows from~~with the ~~data~~[[service ~~sources~~provider ~~which~~interface]] ~~produce~~(SPI) ~~results~~to ~~that~~obtain ~~are~~the ~~returned~~available totables, ~~the~~table ~~coordinator~~statistics, and ~~ultimately~~other ~~back~~information needed to ~~the~~carry ~~client~~out its tasks.<ref name="trino-definitive-guide-ch4" />. * The workers are responsible for executing the tasks and operators fed to them by the scheduler. These tasks process rows from the data sources which produce results that are returned to the coordinator and ultimately back to the client.<ref name="trino-definitive-guide-ch4" /> Trino adheres to the [[ANSI]] [[SQL]]<ref name="trino-definitive-guide-ch1">{{cite book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 1. Introducing Trino \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=~~3-17~~3–17}}</ref> standard and includes various parts of the following ANSI specifications: [[SQL-92]], [[SQL:1999]], [[SQL:2003]], [[SQL:2008]], [[SQL:2011]], [[SQL:2016]], [[SQL:2023]]. Trino supports the separation of compute and storage<ref name="trino-definitive-guide-ch1" /> and may be deployed both on-premises and in the [[Cloud computing\|cloud]].<ref name="trino-definitive-guide-~~ch4~~ch13">{{cite book \|last1=Fuller \|first1=Matt \|last2=Moser \|first2=Manfred \|last3=Traverso \|first3=Martin \|title=Trino: The Definitive Guide \|chapter=Chapter 13. Real-World Examples \|date=2021 \|publisher=O'Reilly Media, Inc, USA \|isbn=9781098107710 \|pages=~~267-272~~267–272}}</ref>. Trino has a [[~~distributed\|~~Distributed computing]] [[massively parallel\|MPP]] architecture.<ref name="trino-definitive-guide-ch4" />. Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads.<ref name="trino-definitive-guide-ch4" />. ==See also== Line 47 ⟶ 48: * [[Data Intensive Computing]] * [[Apache Drill]] * [[~~Computer_cluster~~Computer cluster]] == References == {{Reflist}}~~<br/>~~ == External links == Line 56 ⟶ 57: * [https://trino.io/foundation.html Trino Software Foundation (formerly Presto Software Foundation)] [[:Category:SQL]] [[:Category:Free system software]] [[:Category:Hadoop]] [[:Category:Cloud platforms]] [[:Category:Java platform]]