Trino (SQL query engine): Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 37:
Trino is written in [[Java (programming language)|Java]]. It contains two types of nodes, a '''coordinator''' and a '''worker'''.
 
* The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the [[service provider interface]](SPI) to obtain the available tables, obtain table statistics, check permissions, and other information needed to carry out its tasks.
 
* The workers are responsible for executing the tasks and operators fed to it by the scheduler. These tasks process rows from data sources and produce results that are returned to the coordinator and ultimately back to the client.
Line 45:
Trino supports separation of compute and storage and may be deployed both on premises and in the [[Cloud computing|cloud]].
 
Trino has a [[distributed|Distributed computing|distributed]] [[massively parallel|MPP]] architecture, which was a big departure from the map reduce design used by most popular data lake systems like Hive, Impala, and [[Apache Spark]]. Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads. Another decided characteristic of Trino was avoiding the [[Application checkpointing|checkpointing]] operations involving expensive writes, toused diskby whichsystems leaveslike TrinoHive withoutand [[fault tolerance]]Spark. This leaves queries vulnerable to needing to be restarted if there is a failure. In practice, this is not reported to happen too often.
 
== Use Cases ==
 
In general, Trino is to be used for [[OLAP]] scenarios instead of [[OLTP]] uses<ref>{{cite web |title=Use cases — Trino 361 Documentation |url=https://trino.io/docs/361/overview/use-cases.html |website=trino.io |access-date=20 September 2021}}</ref>.
 
=== Data Lake Query Engine ===
Line 57:
=== Federated Query Engine ===
 
Trino can combine data from multiple sources in a single query. Using the [[service provider interface|SPI]], Trino connectors can query data sources, including files in [[Apache Hadoop#HDFS|HDFS]], [[Amazon S3]], [[MySQL]], [[PostgreSQL]], [[Microsoft SQL Server]], [[Amazon Redshift]], [[Apache Kudu]], [[Apache Pinot]], [[Apache Kafka]], [[Apache Cassandra]], [[Apache Druid]], [[MongoDB]], [[Elasticsearch]], and [[Redis]]. Unlike [[Apache Impala]] and other prior Hadoop-specific tools, Trino workscan work with any underlying system.
 
==See also==