Trino (SQL query engine): Difference between revisions

Content deleted Content added
BattyBot (talk | contribs)
Use Cases: remove marketing section based exclusively on the official web page
Line 45:
 
Trino has a [[Distributed computing|distributed]] [[massively parallel|MPP]] architecture, which was a big departure from the map reduce design used by most popular data lake systems like Hive, Impala, and [[Apache Spark]]. Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads. Another decided characteristic of Trino was avoiding the [[Application checkpointing|checkpointing]] operations involving expensive writes, used by systems like Hive and Spark. Avoiding these writes may require restarting a query in the rare case of failure during the operation.
 
== Use Cases ==
 
In general, Trino is used for [[OLAP]] scenarios instead of [[OLTP]] uses.<ref>{{cite web |title=Use cases — Trino 361 Documentation |url=https://trino.io/docs/361/overview/use-cases.html |website=trino.io |access-date=20 September 2021}}</ref>
 
=== Data Lake Query Engine ===
 
Trino was originally created to replace the [[Apache Hive]] runtime while maintaining the ability to query data in [[Apache Hadoop#Hadoop distributed file system|HDFS]] or [[object storage]]. Many companies use Trino as a query engine to speed up analytics reads from the data lake.
 
=== Federated Query Engine ===
 
Trino can combine data from multiple sources in a single query. Using the [[service provider interface|SPI]], Trino connectors can query data sources, including files in [[Apache Hadoop#HDFS|HDFS]], [[Amazon S3]], [[MySQL]], [[PostgreSQL]], [[Microsoft SQL Server]], [[Amazon Redshift]], [[Apache Kudu]], [[Apache Pinot]], [[Apache Kafka]], [[Apache Cassandra]], [[Apache Druid]], [[MongoDB]], [[Elasticsearch]], and [[Redis]]. Unlike [[Apache Impala]] and other prior Hadoop-specific tools, Trino can work with any underlying system.
 
==See also==