Content deleted Content added
m →Method with filter (it is more sophisticated but necessary for very big dataset): lang="postgresql" |
m →RANK() window function: {{mono}} |
||
(30 intermediate revisions by 24 users not shown) | |||
Line 1:
{{excessive examples|date=October 2024}}
{{Short description|SQL statement that returns a result set of records from one or more tables}}▼
{{trivia|date=October 2024}}
The [[SQL]] '''SELECT''' statement returns a [[result set]] of records, from one or more [[Table (database)|tables]].<ref>{{cite web |url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx |title=Transact-SQL Syntax Conventions |author=Microsoft }}</ref><ref>{{cite web |url=http://dev.mysql.com/doc/refman/5.0/en/select.html|title=SQL SELECT Syntax |author=MySQL}}</ref>▼
▲{{Short description|SQL statement that returns a result set of
▲The [[SQL]] '''SELECT''' statement returns a [[result set]] of
A SELECT statement retrieves zero or more rows from one or more [[Database Tables|database tables]] or database [[View (database)|views]]. In most applications, <code>SELECT</code> is the most commonly used [[data manipulation language]] (DML) command. As SQL is a [[declarative programming]] language, <code>SELECT</code> queries specify a result set, but do not specify how to calculate it. The database translates the query into a "[[query plan]]" which may vary between executions, database versions and database software. This functionality is called the "[[query optimizer]]" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.
Line 6 ⟶ 9:
The SELECT statement has many optional clauses:
* <code>SELECT</code>
* <code>[[Alias (SQL)|AS]]</code> optionally provides an alias for each column or expression in the <code>SELECT</code>
* <code>[[From (SQL)|FROM]]</code> specifies from which table to get the data.<ref>Omitting FROM clause
* <code>
* <code>[[Group by (SQL)|GROUP BY]]</code> groups rows sharing a property so that an [[aggregate function]] can be applied to each group.
* <code>[[Having (SQL)|HAVING]]</code> selects among the groups defined by the GROUP BY clause.
Line 20 ⟶ 23:
Queries allow the user to describe desired data, leaving the [[Database management system|database management system (DBMS)]] to carry out [[query plan|planning]], [[query optimizer|optimizing]], and performing the physical operations necessary to produce that result as it chooses.
A query includes a list of columns to include in the final result, normally immediately following the <code>SELECT</code> keyword. An asterisk ("<code>*</code>") can be used to specify that the query should return all columns of all the queried tables. <code>SELECT</code> is the most complex statement in SQL, with optional keywords and clauses that include:
* The <code>[[From (SQL)|FROM]]</code> clause, which indicates the
* The <code>
* The <code>GROUP BY</code> clause projects rows having common values into a smaller set of rows. <code>GROUP BY</code> is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The <code>WHERE</code> clause is applied before the <code>GROUP BY</code> clause.
* The <code>[[Having (SQL)|HAVING]]</code> clause includes a predicate used to filter rows resulting from the <code>GROUP BY</code> clause. Because it acts on the results of the <code>GROUP BY</code> clause, aggregation functions can be used in the <code>HAVING</code> clause predicate.
* The <code>[[Order by (SQL)|ORDER BY]]</code> clause identifies which
* The <code>DISTINCT</code> keyword<ref>
{{cite book
| title = SAS 9.4 SQL Procedure User's Guide
| date=10 July 2013 | url = https://books.google.com/books?id=ESjMAAAAQBAJ
| publisher = SAS Institute
| publication-date = 2013
Line 71 ⟶ 74:
<syntaxhighlight lang="sql">
SELECT Book.title AS Title,
count(*) AS Authors
ON Book.isbn = Book_author.isbn
GROUP BY Book.title;
</syntaxhighlight>
Line 114 ⟶ 117:
=== Subqueries ===
Queries can be nested so that the results of one query can be used in another query via a [[relational operator]] or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function <code>AVG</code> receives as input the result of a subquery:
<syntaxhighlight lang="sql">
Line 127 ⟶ 130:
A subquery can use values from the outer query, in which case it is known as a [[correlated subquery]].
Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called [[common table expression]]s (named and designed after the IBM DB2 version 2 implementation; Oracle calls these [[subquery factoring]]). CTEs can also be [[recursive]] by referring to themselves; [[Hierarchical and recursive queries in SQL|the resulting mechanism]] allows tree or graph traversals (when represented as relations), and more generally [[fixpoint]] computations.
=== Derived table ===
A derived table is
In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books):
Line 256 ⟶ 259:
<code>ROW_NUMBER() OVER</code> may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows:
<syntaxhighlight lang="
SELECT * FROM
( SELECT
Line 267 ⟶ 270:
</syntaxhighlight>
{{mono|ROW_NUMBER}} can be [[Nondeterministic algorithm|non-deterministic]]: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number.
=== RANK() window function ===
The <code>RANK() OVER</code> window function acts like {{mono|ROW_NUMBER}}, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons:
<syntaxhighlight lang="
SELECT * FROM (
SELECT
Line 309 ⟶ 312:
| <syntaxhighlight lang="postgres" highlight="2">SELECT * FROM T
LIMIT 10 OFFSET 20</syntaxhighlight>
| [[Netezza]], [[MySQL]], [[MariaDB]] (also supports the standard version, since version 10.6), [[SQL Anywhere|SAP SQL Anywhere]], [[PostgreSQL]] (also supports the standard, since version 8.4), [[SQLite]], [[HSQLDB]], [[H2 (DBMS)|H2]], [[Vertica]], [[Polyhedra DBMS|Polyhedra]], [[Couchbase Server]], [[Snowflake Computing]], [[Virtuoso Universal Server|OpenLink Virtuoso]]
|-
| <syntaxhighlight lang="sql" highlight="2">SELECT * from T
Line 347 ⟶ 350:
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY</syntaxhighlight>
| [[IBM
|-
| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY</syntaxhighlight>
| [[IBM
|}
Line 396 ⟶ 399:
order by {unique_key}</syntaxhighlight>
| SQL Server 2005
|-
|
<syntaxhighlight lang="mysql">Select *
from {table}
order by {unique_key}
ROWS LIMIT {begin_base_0 + rows}</syntaxhighlight>
| Sybase, ASE 16 SP2
|-
|
Line 446 ⟶ 456:
LIMIT {begin_base_0}, {rows}</syntaxhighlight>
| MySQL<br>MariaDB<br>SQLite
|-
|
<syntaxhighlight lang="mysql">Select *
from {table}
order by {unique_key}
ROWS LIMIT {rows} OFFSET {begin_base_0}</syntaxhighlight>
| Sybase, ASE 16 SP2
|-
|
Line 649 ⟶ 666:
For example,
{{sxhl|2=tsql|
sum(population) OVER( PARTITION BY city )
}}
calculates the sum of the populations of all rows having the same ''city'' value as the current row.
Partitions are specified using the '''OVER''' clause which modifies the aggregate. Syntax:
{{sxhl|2=bnf|1=
<OVER_CLAUSE> :: =
OVER ( [ PARTITION BY <expr>, ... ]
[ ORDER BY <expression> ] )
}}
The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.
Line 666 ⟶ 683:
{{ordered list
|1=<syntaxhighlight lang="
select g.*
from users u inner join groups g on g.Userid = u.Userid
Line 676 ⟶ 693:
|3= the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2
|4= If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were:
<syntaxhighlight lang="
select u.*
from users u left join groups g on g.Userid = u.Userid
Line 684 ⟶ 701:
|5= the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4
|6= the GROUP BY is evaluated; if the above query were:
<syntaxhighlight lang="
select g.GroupName, count(g.*) as NumberOfMembers
from users u inner join groups g on g.Userid = u.Userid
Line 691 ⟶ 708:
vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName
|7= the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example:
<syntaxhighlight lang="
select g.GroupName, count(g.*) as NumberOfMembers
from users u inner join groups g on g.Userid = u.Userid
Line 707 ⟶ 724:
== Generating data in T-SQL ==
Method to generate data based on the union all
<syntaxhighlight lang="
select 1 a, 1 b union all
select 1, 2 union all
Line 715 ⟶ 732:
</syntaxhighlight>
SQL Server 2008 supports the "row constructor" feature, specified in the
<syntaxhighlight lang="
select *
from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)
|