Select (SQL): Difference between revisions

Content deleted Content added
change source to syntaxhighlight
 
(42 intermediate revisions by 28 users not shown)
Line 1:
{{excessive examples|date=October 2024}}
The [[SQL]] '''SELECT''' statement returns a [[result set]] of records, from one or more [[Table (database)|tables]].<ref>{{cite web |url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx |title=Transact-SQL Syntax Conventions |accessdate= |author=Microsoft }}</ref><ref>{{cite web |url=http://dev.mysql.com/doc/refman/5.0/en/select.html|title=SQL SELECT Syntax |accessdate= |author=MySQL}}</ref>
{{trivia|date=October 2024}}
 
{{Short description|SQL statement that returns a result set of rows from one or more tables}}
The [[SQL]] '''SELECT''' statement returns a [[result set]] of recordsrows, from one or more [[Table (database)|tables]].<ref>{{cite web |url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx |title=Transact-SQL Syntax Conventions |accessdateauthor=Microsoft |authordate=Microsoft23 May 2023 }}</ref><ref>{{cite web |url=http://dev.mysql.com/doc/refman/5.0/en/select.html|title=SQL SELECT Syntax |accessdate= |author=MySQL}}</ref>
 
A SELECT statement retrieves zero or more rows from one or more [[Database Tables|database tables]] or database [[View (database)|views]]. In most applications, <code>SELECT</code> is the most commonly used [[data manipulation language]] (DML) command. As SQL is a [[declarative programming]] language, <code>SELECT</code> queries specify a result set, but do not specify how to calculate it. The database translates the query into a "[[query plan]]" which may vary between executions, database versions and database software. This functionality is called the "[[query optimizer]]" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.
Line 5 ⟶ 9:
The SELECT statement has many optional clauses:
 
* <code>SELECT</code> clauselist is the list of [[column (database)|columns]] or SQL expressions that mustto be returned by the query. This is approximately the [[relational algebra]] [[Projection_(relational_algebra)|projection]] operation.
* <code>[[Alias (SQL)|AS]]</code> optionally provides an alias for each column or expression in the <code>SELECT</code> clauselist. This is the relational algebra [[Rename_(relational_algebra)|rename]] operation.
* <code>[[From (SQL)|FROM]]</code> specifies from which table to get the data.<ref>Omitting FROM clause isn'tis not standard, but allowed by most major DBMSes.</ref>
* <code>[[Where (SQL)|WHERE]]</code> specifies which rows to retrieve. This is approximately the relational algebra [[Selection_(relational_algebra)|selection]] operation.
* <code>[[Group by (SQL)|GROUP BY]]</code> groups rows sharing a property so that an [[aggregate function]] can be applied to each group.
* <code>[[Having (SQL)|HAVING]]</code> selects among the groups defined by the GROUP BY clause.
Line 15 ⟶ 19:
== Overview ==
 
<code>SELECT</code> is the most common operation in SQL, called "the query". <code>SELECT</code> retrieves data from one or more [[Table (database)|table]]s, or expressions. Standard <code>SELECT</code> statements have no persistent effects on the database. Some non-standard implementations of <code>SELECT</code> can have persistent effects, such as the <code>SELECT INTO</code> syntax provided in some databases.<ref name="ms-sql-select-into">{{ cite book | chapter = Transact-SQL Reference | title = SQL Server Language Reference | series = SQL Server 2005 Books Online | publisher = Microsoft | date = 2007-09-15 | url = http://msdn.microsoft.com/en-us/library/ms188029.aspx | accessdateaccess-date = 2007-06-17 }}</ref>
 
Queries allow the user to describe desired data, leaving the [[Database management system|database management system (DBMS)]] to carry out [[query plan|planning]], [[query optimizer|optimizing]], and performing the physical operations necessary to produce that result as it chooses.
 
A query includes a list of columns to include in the final result, normally immediately following the <code>SELECT</code> keyword. An asterisk ("<code>*</code>") can be used to specify that the query should return all columns of all the queried tables. <code>SELECT</code> is the most complex statement in SQL, with optional keywords and clauses that include:
 
* The <code>[[From (SQL)|FROM]]</code> clause, which indicates the table(s)tables to retrieve data from. The <code>FROM</code> clause can include optional <code>[[Join (SQL)|JOIN]]</code> subclauses to specify the rules for joining tables.
* The <code>[[Where (SQL)|WHERE]]</code> clause includes a comparison predicate, which restricts the rows returned by the query. The <code>WHERE</code> clause eliminates all rows from the result set where the comparison predicate does not evaluate to True.
* The <code>GROUP BY</code> clause projects rows having common values into a smaller set of rows. <code>GROUP BY</code> is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The <code>WHERE</code> clause is applied before the <code>GROUP BY</code> clause.
* The <code>[[Having (SQL)|HAVING]]</code> clause includes a predicate used to filter rows resulting from the <code>GROUP BY</code> clause. Because it acts on the results of the <code>GROUP BY</code> clause, aggregation functions can be used in the <code>HAVING</code> clause predicate.
* The <code>[[Order by (SQL)|ORDER BY]]</code> clause identifies which column[s]columns to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an <code>ORDER BY</code> clause, the order of rows returned by an SQL query is undefined.
* The <code>DISTINCT</code> keyword<ref>
{{cite book
| title = SAS 9.4 SQL Procedure User's Guide
| date=10 July 2013 | url = https://books.google.com/books?id=ESjMAAAAQBAJ
| publisher = SAS Institute
| publication-date = 2013
| page = 248
| isbn = 9781612905686
| accessdateaccess-date = 2015-10-21
| quote = Although the UNIQUE argument is identical to DISTINCT, it is not an ANSI standard.
}}
Line 53 ⟶ 57:
| page = 143
| isbn = 9780074637081
| accessdateaccess-date = 2015-10-21
| quote = [...] the keyword DISTINCT [...] eliminates the duplicates from the result set.
}}
</ref>
</ref> Note that <code>DISTINCT</code> is an example of [[syntactic sugar]]: it is possible to achieve the same result by listing all selected columns (but not the aggregate functions, if any) in the <code>GROUP BY</code> clause.
 
The following example of a <code>SELECT</code> query returns a list of expensive books. The query retrieves all rows from the ''Book'' table in which the ''price'' column contains a value greater than 100.00. The result is sorted in ascending order by ''title''. The asterisk (*) in the ''select list'' indicates that all columns of the ''Book'' table should be included in the result set.
Line 70 ⟶ 74:
 
<syntaxhighlight lang="sql">
SELECT Book.title AS Title,
count(*) AS Authors
FROM Book
JOIN Book_author
ON Book.isbn = Book_author.isbn
GROUP BY Book.title;
</syntaxhighlight>
Line 113 ⟶ 117:
=== Subqueries ===
 
Queries can be nested so that the results of one query can be used in another query via a [[relational operator]] or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function <code>AVG</code> receives as input the result of a subquery:
 
<syntaxhighlight lang="sql">
Line 126 ⟶ 130:
A subquery can use values from the outer query, in which case it is known as a [[correlated subquery]].
 
Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called [[common table expression]]s (named and designed after the IBM DB2 version 2 implementation; Oracle calls these [[subquery factoring]]). CTEs can also be [[recursive]] by referring to themselves; [[Hierarchical and recursive queries in SQL|the resulting mechanism]] allows tree or graph traversals (when represented as relations), and more generally [[fixpoint]] computations.
 
=== InlineDerived viewtable ===
 
AnA Inlinederived viewtable is the use of referencing an SQLa subquery in a FROM clause. Essentially, the inlinederived viewtable is a subquery that can be selected from or joined to. Derived Inline Viewtable functionality allows the user to reference the subquery as a table. The inlinederived viewtable also is referred to as aan ''derivedinline tableview'' or a ''subselect''. Inline view functionality was introducedselect in Oraclefrom 9ilist''.<ref name="Inline View/Derived Table" />
 
In the following example, the SQL statement involves a join from the initial Books table to the Inlinederived viewtable "Sales". This inlinederived viewtable captures associated book sales information using the ISBN to join to the Books table. As a result, the inlinederived viewtable provides the result set with additional columns (the number of items sold and the company that sold the books):
 
<syntaxhighlight lang="sql">
Line 249 ⟶ 253:
ISO [[SQL:2008]] introduced the <code>FETCH FIRST</code> clause.
 
According to PostgreSQL v.9 documentation, an '''SQL Windowwindow function''' ''"performs a calculation across a set of table rows that are somehow related to the current row''", in a way similar to aggregate functions.<ref>[https://www.postgresql.org/docs/9.1/static/tutorial-window.html PostgreSQL 9.1.24 Documentation - Chapter 3. Advanced Features]</ref>
The name recalls signal processing [[window function | window functions]]. A window function call always contains an '''OVER''' clause.
 
=== ROW_NUMBER() window function ===
<code>ROW_NUMBER() OVER</code> may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows:
 
<syntaxhighlight lang="sqltsql" highlight="3">
SELECT * FROM
( SELECT
Line 266 ⟶ 270:
</syntaxhighlight>
 
{{mono|ROW_NUMBER}} can be [[Nondeterministic algorithm|non-deterministic]]: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number.
 
=== RANK() window function ===
The <code>RANK() OVER</code> window function acts like {{mono|ROW_NUMBER}}, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons:
 
<syntaxhighlight lang="sqltsql" highlight="3">
SELECT * FROM (
SELECT
Line 308 ⟶ 312:
| <syntaxhighlight lang="postgres" highlight="2">SELECT * FROM T
LIMIT 10 OFFSET 20</syntaxhighlight>
| [[Netezza]], [[MySQL]], [[MariaDB]] (also supports the standard version, since version 10.6), [[SQL Anywhere|SAP SQL Anywhere]], [[PostgreSQL]] (also supports the standard, since version 8.4), [[SQLite]], [[HSQLDB]], [[H2 (DBMS)|H2]], [[Vertica]], [[Polyhedra DBMS|Polyhedra]], [[Couchbase Server]], [[Snowflake Computing]], [[Virtuoso Universal Server|OpenLink Virtuoso]]
|-
| <syntaxhighlight lang="sql" highlight="2">SELECT * from T
Line 331 ⟶ 335:
|-
| <code>SELECT '''TOP 20, 10''' * FROM T</code>
| [[Virtuoso Universal Server|OpenLink Virtuoso]] (skips 20, delivers next 10)<ref name="docs_9.19">{{Cite web |title=9.19.10. The TOP SELECT Option |author=OpenLink Software |work=docs.openlinksw.com |date= |access-date=1 October 2019 |url= http://docs.openlinksw.com/virtuoso/topselectoption/ |language=en-US}}</ref>
 
|-
Line 346 ⟶ 350:
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY</syntaxhighlight>
| [[IBM DB2|DB2Db2]]
|-
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY</syntaxhighlight>
| [[IBM DB2|DB2Db2]] (new rows are filtered after comparing with key column of table T)
|}
 
Line 377 ⟶ 381:
|-
|
<syntaxhighlight lang="tsqlpostgresql">select *
from {table}
order by {unique_key}
FETCH FIRST {begin_base_0 + rows} ROWS ONLY</syntaxhighlight>
| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL
|-
|
Line 395 ⟶ 399:
order by {unique_key}</syntaxhighlight>
| SQL Server 2005
|-
|
<syntaxhighlight lang="mysql">Select *
from {table}
order by {unique_key}
ROWS LIMIT {begin_base_0 + rows}</syntaxhighlight>
| Sybase, ASE 16 SP2
|-
|
Line 430 ⟶ 441:
OFFSET {begin_base_0} ROWS
FETCH NEXT {rows} ROWS ONLY</syntaxhighlight>
| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL
|-
|
Line 445 ⟶ 456:
LIMIT {begin_base_0}, {rows}</syntaxhighlight>
| MySQL<br>MariaDB<br>SQLite
|-
|
<syntaxhighlight lang="mysql">Select *
from {table}
order by {unique_key}
ROWS LIMIT {rows} OFFSET {begin_base_0}</syntaxhighlight>
| Sybase, ASE 16 SP2
|-
|
Line 518 ⟶ 536:
|-
|
<syntaxhighlight lang="tsqlpostgresql">select *
from {table}
order by {unique_key}
FETCH FIRST {rows} ROWS ONLY</syntaxhighlight>
|
<syntaxhighlight lang="tsqlpostgresql">select *
from {table}
where {unique_key} > {last_val}
Line 529 ⟶ 547:
FETCH FIRST {rows} ROWS ONLY</syntaxhighlight>
|
<syntaxhighlight lang="tsqlpostgresql">select *
from (
select *
Line 538 ⟶ 556:
) a
order by {unique_key}</syntaxhighlight>
| SQL ANSI 2008<br>PostgreSQL<br>SQL Server 2012<br>Derby<br>Oracle 12c<br>DB2 12<br>Mimer SQL
|-
|
Line 648 ⟶ 666:
 
For example,
{{sxhl|2=tsql|
 
sum(population) OVER( PARTITION BY city )
}}
 
calculates the sum of the populations of all rows having the same ''city'' value as the current row.
 
Partitions are specified using the '''OVER''' clause which modifies the aggregate. Syntax:
{{sxhl|2=bnf|1=
 
<OVER_CLAUSE> :: =
OVER ( [ PARTITION BY <expr>, ... ]
[ ORDER BY <expression> ] )
}}
 
The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.
 
Line 665 ⟶ 683:
 
{{ordered list
|1=<syntaxhighlight lang="sqlpostgresql">
select g.*
from users u inner join groups g on g.Userid = u.Userid
Line 675 ⟶ 693:
|3= the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2
|4= If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:
<syntaxhighlight lang="sqlpostgresql">
select u.*
from users u left join groups g on g.Userid = u.Userid
Line 683 ⟶ 701:
|5= the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4
|6= the GROUP BY is evaluated; if the above query were:
<syntaxhighlight lang="sqlpostgresql">
select g.GroupName, count(g.*) as NumberOfMembers
from users u inner join groups g on g.Userid = u.Userid
Line 690 ⟶ 708:
vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
|7= the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:
<syntaxhighlight lang="sqlpostgresql">
select g.GroupName, count(g.*) as NumberOfMembers
from users u inner join groups g on g.Userid = u.Userid
Line 706 ⟶ 724:
== Generating data in T-SQL ==
Method to generate data based on the union all
<syntaxhighlight lang="sqltsql">
select 1 a, 1 b union all
select 1, 2 union all
Line 714 ⟶ 732:
</syntaxhighlight>
 
SQL Server 2008 supports the "row constructor" feature, specified in the SQL3 ("[[SQL:1999")]] standard
<syntaxhighlight lang="sqltsql">
select *
from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)