Select (SQL): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:18, 25 November 2020 edit Federico Razzoli (talk \| contribs) 62 edits →Overview: mention that DISTINCT is a type of GROUP BY ← Previous edit		Latest revision as of 05:24, 13 August 2025 edit undo Cedar101 (talk \| contribs) Extended confirmed users 18,557 edits m →RANK() window function: {{mono}}
(45 intermediate revisions by 30 users not shown)
Line 1: {{excessive examples\|date=October 2024}} The [[SQL]] '''SELECT''' statement returns a [[result set]] of records, from one or more [[Table (database)\|tables]].<ref>{{cite web \|url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx \|title=Transact-SQL Syntax Conventions \|accessdate= \|author=Microsoft }}</ref><ref>{{cite web \|url=http://dev.mysql.com/doc/refman/5.0/en/select.html\|title=SQL SELECT Syntax \|accessdate= \|author=MySQL}}</ref>▼ {{trivia\|date=October 2024}} {{Short description\|SQL statement that returns a result set of rows from one or more tables}} ▲The [[SQL]] '''SELECT''' statement returns a [[result set]] of ~~records~~rows, from one or more [[Table (database)\|tables]].<ref>{{cite web \|url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx \|title=Transact-SQL Syntax Conventions \|~~accessdate~~author=Microsoft \|~~author~~date=~~Microsoft~~23 May 2023 }}</ref><ref>{{cite web \|url=http://dev.mysql.com/doc/refman/5.0/en/select.html\|title=SQL SELECT Syntax ~~\|accessdate=~~ \|author=MySQL}}</ref> A SELECT statement retrieves zero or more rows from one or more [[Database Tables\|database tables]] or database [[View (database)\|views]]. In most applications, <code>SELECT</code> is the most commonly used [[data manipulation language]] (DML) command. As SQL is a [[declarative programming]] language, <code>SELECT</code> queries specify a result set, but do not specify how to calculate it. The database translates the query into a "[[query plan]]" which may vary between executions, database versions and database software. This functionality is called the "[[query optimizer]]" as it is responsible for finding the best possible execution plan for the query, within applicable constraints. Line 5 ⟶ 9: The SELECT statement has many optional clauses: * <code>SELECT</code> list is the list of [[column (database)\|columns]] or SQL expressions to be returned by the query. This is approximately the [[relational algebra]] [[Projection_(relational_algebra)\|projection]] operation. * <code>[[From (SQL)\|FROM]]</code> specifies from which table to get the data.<ref>Omitting FROM clause isn't standard, but allowed by most major DBMSes.</ref>▼ * <code>[[Alias (SQL)\|AS]]</code> optionally provides an alias for each column or expression in the <code>SELECT</code> list. This is the relational algebra [[Rename_(relational_algebra)\|rename]] operation. * <code>[[Where (SQL)\|WHERE]]</code> specifies which rows to retrieve. ▲* <code>[[From (SQL)\|FROM]]</code> specifies from which table to get the data.<ref>Omitting FROM clause ~~isn't~~is not standard, but allowed by most major DBMSes.</ref> * <code>WHERE</code> specifies which rows to retrieve. This is approximately the relational algebra [[Selection_(relational_algebra)\|selection]] operation. * <code>[[Group by (SQL)\|GROUP BY]]</code> groups rows sharing a property so that an [[aggregate function]] can be applied to each group. * <code>[[Having (SQL)\|HAVING]]</code> selects among the groups defined by the GROUP BY clause. * <code>[[Order by (SQL)\|ORDER BY]]</code> specifies how to order the returned rows. * <code>[[Alias (SQL)\|AS]]</code> provides an alias which can be used to temporarily rename tables or columns. == Overview == <code>SELECT</code> is the most common operation in SQL, called "the query". <code>SELECT</code> retrieves data from one or more [[Table (database)\|table]]s, or expressions. Standard <code>SELECT</code> statements have no persistent effects on the database. Some non-standard implementations of <code>SELECT</code> can have persistent effects, such as the <code>SELECT INTO</code> syntax provided in some databases.<ref name="ms-sql-select-into">{{ cite book \| chapter = Transact-SQL Reference \| title = SQL Server Language Reference \| series = SQL Server 2005 Books Online \| publisher = Microsoft \| date = 2007-09-15 \| url = http://msdn.microsoft.com/en-us/library/ms188029.aspx \| ~~accessdate~~access-date = 2007-06-17 }}</ref> Queries allow the user to describe desired data, leaving the [[Database management system\|database management system (DBMS)]] to carry out [[query plan\|planning]], [[query optimizer\|optimizing]], and performing the physical operations necessary to produce that result as it chooses. A query includes a list of columns to include in the final result, normally immediately following the <code>SELECT</code> keyword. An asterisk ("<code></code>") can be used to specify that the query should return all columns of all the queried tables. <code>SELECT</code> is the most complex statement in SQL, with optional keywords and clauses that include: The <code>[[From (SQL)\|FROM]]</code> clause, which indicates the ~~table(s)~~tables to retrieve data from. The <code>FROM</code> clause can include optional <code>[[Join (SQL)\|JOIN]]</code> subclauses to specify the rules for joining tables. * The <code>~~[[Where (SQL)\|~~WHERE]]</code> clause includes a comparison predicate, which restricts the rows returned by the query. The <code>WHERE</code> clause eliminates all rows from the result set where the comparison predicate does not evaluate to True. * The <code>GROUP BY</code> clause projects rows having common values into a smaller set of rows. <code>GROUP BY</code> is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The <code>WHERE</code> clause is applied before the <code>GROUP BY</code> clause. * The <code>[[Having (SQL)\|HAVING]]</code> clause includes a predicate used to filter rows resulting from the <code>GROUP BY</code> clause. Because it acts on the results of the <code>GROUP BY</code> clause, aggregation functions can be used in the <code>HAVING</code> clause predicate. * The <code>[[Order by (SQL)\|ORDER BY]]</code> clause identifies which ~~column[s]~~columns to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an <code>ORDER BY</code> clause, the order of rows returned by an SQL query is undefined. * The <code>DISTINCT</code> keyword<ref> {{cite book \| title = SAS 9.4 SQL Procedure User's Guide \| date=10 July 2013 \| url = https://books.google.com/books?id=ESjMAAAAQBAJ \| publisher = SAS Institute \| publication-date = 2013 \| page = 248 \| isbn = 9781612905686 \| ~~accessdate~~access-date = 2015-10-21 \| quote = Although the UNIQUE argument is identical to DISTINCT, it is not an ANSI standard. }} Line 52 ⟶ 57: \| page = 143 \| isbn = 9780074637081 \| ~~accessdate~~access-date = 2015-10-21 \| quote = [...] the keyword DISTINCT [...] eliminates the duplicates from the result set. }} </ref> </ref> Note that <code>DISTINCT</code> is an example of [[syntactic sugar]]: it is possible to achieve the same result by listing all selected columns (but not the aggregate functions, if any) in the <code>GROUP BY</code> clause. The following example of a <code>SELECT</code> query returns a list of expensive books. The query retrieves all rows from the ''Book'' table in which the ''price'' column contains a value greater than 100.00. The result is sorted in ascending order by ''title''. The asterisk () in the ''select list'' indicates that all columns of the ''Book'' table should be included in the result set. <~~source~~syntaxhighlight lang="sql"> SELECT FROM Book WHERE price > 100.00 ORDER BY title; </syntaxhighlight> ~~</source>~~ The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book. <~~source~~syntaxhighlight lang="sql"> SELECT Book.title AS Title, count() AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; </syntaxhighlight> ~~</source>~~ Example output might resemble the following: Line 88 ⟶ 93: Under the precondition that ''isbn'' is the only common column name of the two tables and that a column named ''title'' only exists in the ''Book'' table, one could re-write the query above in the following form: <~~source~~syntaxhighlight lang="sql"> SELECT title, count() AS Authors Line 94 ⟶ 99: NATURAL JOIN Book_author GROUP BY title; </syntaxhighlight> ~~</source>~~ However, many{{quantify\|date=October 2015}} vendors either do not support this approach, or require certain column-naming conventions for natural joins to work effectively. Line 100 ⟶ 105: SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the ''select list'' to project data, as in the following example, which returns a list of books that cost more than 100.00 with an additional ''sales_tax'' column containing a sales tax figure calculated at 6% of the ''price''. <~~source~~syntaxhighlight lang="sql"> SELECT isbn, title, Line 108 ⟶ 113: WHERE price > 100.00 ORDER BY title; </syntaxhighlight> ~~</source>~~ === Subqueries === Queries can be nested so that the results of one query can be used in another query via a [[relational operator]] or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function <code>AVG</code> receives as input the result of a subquery: <~~source~~syntaxhighlight lang="sql"> SELECT isbn, title, Line 121 ⟶ 126: WHERE price < (SELECT AVG(price) FROM Book) ORDER BY title; </syntaxhighlight> ~~</source>~~ A subquery can use values from the outer query, in which case it is known as a [[correlated subquery]]. Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called [[common table expression]]s (named and designed after the IBM DB2 version 2 implementation; Oracle calls these [[subquery factoring]]). CTEs can also be [[recursive]] by referring to themselves; [[Hierarchical and recursive queries in SQL\|the resulting mechanism]] allows tree or graph traversals (when represented as relations), and more generally [[fixpoint]] computations. === ~~Inline~~Derived ~~view~~table === AnA ~~Inline~~derived ~~view~~table is ~~the use of referencing an SQL~~a subquery in a FROM clause. Essentially, the ~~inline~~derived ~~view~~table is a subquery that can be selected from or joined to. Derived ~~Inline View~~table functionality allows the user to reference the subquery as a table. The ~~inline~~derived ~~view~~table also is referred to as aan ''~~derived~~inline ~~table~~view'' or a ''~~subselect''. Inline view functionality was introduced~~select in ~~Oracle~~from 9ilist''.~~<ref name="Inline View/Derived Table" />~~ In the following example, the SQL statement involves a join from the initial Books table to the ~~Inline~~derived ~~view~~table "Sales". This ~~inline~~derived ~~view~~table captures associated book sales information using the ISBN to join to the Books table. As a result, the ~~inline~~derived ~~view~~table provides the result set with additional columns (the number of items sold and the company that sold the books): <~~source~~syntaxhighlight lang="sql"> SELECT b.isbn, b.title, b.price, sales.items_sold, sales.company_nm FROM Book b Line 140 ⟶ 145: GROUP BY Company_Nm, ISBN) sales ON sales.isbn = b.isbn </syntaxhighlight> ~~</source>~~ == Examples == Line 248 ⟶ 253: ISO [[SQL:2008]] introduced the <code>FETCH FIRST</code> clause. According to PostgreSQL v.9 documentation, an ~~'''~~SQL ~~Window~~window function~~'''~~ ''"performs a calculation across a set of table rows that are somehow related to the current row''", in a way similar to aggregate functions.<ref>[https://www.postgresql.org/docs/9.1/static/tutorial-window.html PostgreSQL 9.1.24 Documentation - Chapter 3. Advanced Features]</ref> The name recalls signal processing [[window function \| window functions]]. A window function call always contains an '''OVER''' clause. === ROW_NUMBER() window function === <code>ROW_NUMBER() OVER</code> may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows: <syntaxhighlight lang="~~sql~~tsql" highlight="3"> SELECT * FROM ( SELECT Line 265 ⟶ 270: </syntaxhighlight> {{mono\|ROW_NUMBER}} can be [[Nondeterministic algorithm\|non-deterministic]]: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number. === RANK() window function === The <code>RANK() OVER</code> window function acts like {{mono\|ROW_NUMBER}}, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons: <syntaxhighlight lang="~~sql~~tsql" highlight="3"> SELECT * FROM ( SELECT Line 307 ⟶ 312: \| <syntaxhighlight lang="postgres" highlight="2">SELECT * FROM T LIMIT 10 OFFSET 20</syntaxhighlight> \| [[Netezza]], [[MySQL]], [[MariaDB]] (also supports the standard version, since version 10.6), [[SQL Anywhere\|SAP SQL Anywhere]], [[PostgreSQL]] (also supports the standard, since version 8.4), [[SQLite]], [[HSQLDB]], [[H2 (DBMS)\|H2]], [[Vertica]], [[Polyhedra DBMS\|Polyhedra]], [[Couchbase Server]], [[Snowflake Computing]], [[Virtuoso Universal Server\|OpenLink Virtuoso]] \|- \| <syntaxhighlight lang="sql" highlight="2">SELECT * from T Line 330 ⟶ 335: \|- \| <code>SELECT '''TOP 20, 10''' * FROM T</code> \| [[Virtuoso Universal Server\|OpenLink Virtuoso]] (skips 20, delivers next 10)<ref name="docs_9.19">{{Cite web \|title=9.19.10. The TOP SELECT Option \|author=OpenLink Software \|work=docs.openlinksw.com ~~\|date=~~ \|access-date=1 October 2019 \|url= http://docs.openlinksw.com/virtuoso/topselectoption/ \|language=en-US}}</ref> \|- Line 345 ⟶ 350: \| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY</syntaxhighlight> \| [[IBM ~~DB2\|DB2~~Db2]] \|- \| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY</syntaxhighlight> \| [[IBM ~~DB2\|DB2~~Db2]] (new rows are filtered after comparing with key column of table T) \|} Line 376 ⟶ 381: \|- \| <syntaxhighlight lang="~~tsql~~postgresql">select * from {table} order by {unique_key} FETCH FIRST {begin_base_0 + rows} ROWS ONLY</syntaxhighlight> \| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL \|- \| Line 394 ⟶ 399: order by {unique_key}</syntaxhighlight> \| SQL Server 2005 \|- \| <syntaxhighlight lang="mysql">Select * from {table} order by {unique_key} ROWS LIMIT {begin_base_0 + rows}</syntaxhighlight> \| Sybase, ASE 16 SP2 \|- \| Line 429 ⟶ 441: OFFSET {begin_base_0} ROWS FETCH NEXT {rows} ROWS ONLY</syntaxhighlight> \| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL \|- \| Line 444 ⟶ 456: LIMIT {begin_base_0}, {rows}</syntaxhighlight> \| MySQL<br>MariaDB<br>SQLite \|- \| <syntaxhighlight lang="mysql">Select * from {table} order by {unique_key} ROWS LIMIT {rows} OFFSET {begin_base_0}</syntaxhighlight> \| Sybase, ASE 16 SP2 \|- \| Line 517 ⟶ 536: \|- \| <syntaxhighlight lang="~~tsql~~postgresql">select * from {table} order by {unique_key} FETCH FIRST {rows} ROWS ONLY</syntaxhighlight> \| <syntaxhighlight lang="~~tsql~~postgresql">select * from {table} where {unique_key} > {last_val} Line 528 ⟶ 547: FETCH FIRST {rows} ROWS ONLY</syntaxhighlight> \| <syntaxhighlight lang="~~tsql~~postgresql">select * from ( select * Line 537 ⟶ 556: ) a order by {unique_key}</syntaxhighlight> \| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL \|- \| Line 647 ⟶ 666: For example, {{sxhl\|2=tsql\| sum(population) OVER( PARTITION BY city ) }} calculates the sum of the populations of all rows having the same ''city'' value as the current row. Partitions are specified using the '''OVER''' clause which modifies the aggregate. Syntax: {{sxhl\|2=bnf\|1= <OVER_CLAUSE> :: = OVER ( [ PARTITION BY <expr>, ... ] [ ORDER BY <expression> ] ) }} The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number. Line 664 ⟶ 683: {{ordered list \|1=<syntaxhighlight lang="~~sql~~postgresql"> select g.* from users u inner join groups g on g.Userid = u.Userid Line 674 ⟶ 693: \|3= the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2 \|4= If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were: <syntaxhighlight lang="~~sql~~postgresql"> select u.* from users u left join groups g on g.Userid = u.Userid Line 682 ⟶ 701: \|5= the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4 \|6= the GROUP BY is evaluated; if the above query were: <syntaxhighlight lang="~~sql~~postgresql"> select g.GroupName, count(g.) as NumberOfMembers from users u inner join groups g on g.Userid = u.Userid Line 689 ⟶ 708: vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName \|7= the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example: <syntaxhighlight lang="~~sql~~postgresql"> select g.GroupName, count(g.) as NumberOfMembers from users u inner join groups g on g.Userid = u.Userid Line 705 ⟶ 724: == Generating data in T-SQL == Method to generate data based on the union all <syntaxhighlight lang="~~sql~~tsql"> select 1 a, 1 b union all select 1, 2 union all Line 713 ⟶ 732: </syntaxhighlight> SQL Server 2008 supports the "row constructor" feature, specified in the ~~SQL3 ("~~[[SQL:1999")]] standard <syntaxhighlight lang="~~sql~~tsql"> select * from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)