Select (SQL): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:27, 1 October 2021 edit Cedar101 (talk \| contribs) Extended confirmed users 18,557 edits m →Method with filter (it is more sophisticated but necessary for very big dataset): lang="postgresql" ← Previous edit		Latest revision as of 05:24, 13 August 2025 edit undo Cedar101 (talk \| contribs) Extended confirmed users 18,557 edits m →RANK() window function: {{mono}}
(30 intermediate revisions by 24 users not shown)
Line 1: {{excessive examples\|date=October 2024}} {{Short description\|SQL statement that returns a result set of records from one or more tables}}▼ {{trivia\|date=October 2024}} The [[SQL]] '''SELECT''' statement returns a [[result set]] of records, from one or more [[Table (database)\|tables]].<ref>{{cite web \|url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx \|title=Transact-SQL Syntax Conventions \|author=Microsoft }}</ref><ref>{{cite web \|url=http://dev.mysql.com/doc/refman/5.0/en/select.html\|title=SQL SELECT Syntax \|author=MySQL}}</ref>▼ ▲{{Short description\|SQL statement that returns a result set of ~~records~~rows from one or more tables}} ▲The [[SQL]] '''SELECT''' statement returns a [[result set]] of ~~records~~rows, from one or more [[Table (database)\|tables]].<ref>{{cite web \|url=http://msdn2.microsoft.com/en-us/library/ms189499.aspx \|title=Transact-SQL Syntax Conventions \|author=Microsoft \|date=23 May 2023 }}</ref><ref>{{cite web \|url=http://dev.mysql.com/doc/refman/5.0/en/select.html\|title=SQL SELECT Syntax \|author=MySQL}}</ref> A SELECT statement retrieves zero or more rows from one or more [[Database Tables\|database tables]] or database [[View (database)\|views]]. In most applications, <code>SELECT</code> is the most commonly used [[data manipulation language]] (DML) command. As SQL is a [[declarative programming]] language, <code>SELECT</code> queries specify a result set, but do not specify how to calculate it. The database translates the query into a "[[query plan]]" which may vary between executions, database versions and database software. This functionality is called the "[[query optimizer]]" as it is responsible for finding the best possible execution plan for the query, within applicable constraints. Line 6 ⟶ 9: The SELECT statement has many optional clauses: * <code>SELECT</code> ~~clause~~list is the list of [[column (database)\|columns]] or SQL expressions ~~that must~~to be returned by the query. This is approximately the [[relational algebra]] [[Projection_(relational_algebra)\|projection]] operation. * <code>[[Alias (SQL)\|AS]]</code> optionally provides an alias for each column or expression in the <code>SELECT</code> ~~clause~~list. This is the relational algebra [[Rename_(relational_algebra)\|rename]] operation. * <code>[[From (SQL)\|FROM]]</code> specifies from which table to get the data.<ref>Omitting FROM clause ~~isn't~~is not standard, but allowed by most major DBMSes.</ref> * <code>~~[[Where (SQL)\|~~WHERE]]</code> specifies which rows to retrieve. This is approximately the relational algebra [[Selection_(relational_algebra)\|selection]] operation. * <code>[[Group by (SQL)\|GROUP BY]]</code> groups rows sharing a property so that an [[aggregate function]] can be applied to each group. * <code>[[Having (SQL)\|HAVING]]</code> selects among the groups defined by the GROUP BY clause. Line 20 ⟶ 23: Queries allow the user to describe desired data, leaving the [[Database management system\|database management system (DBMS)]] to carry out [[query plan\|planning]], [[query optimizer\|optimizing]], and performing the physical operations necessary to produce that result as it chooses. A query includes a list of columns to include in the final result, normally immediately following the <code>SELECT</code> keyword. An asterisk ("<code></code>") can be used to specify that the query should return all columns of all the queried tables. <code>SELECT</code> is the most complex statement in SQL, with optional keywords and clauses that include: The <code>[[From (SQL)\|FROM]]</code> clause, which indicates the ~~table(s)~~tables to retrieve data from. The <code>FROM</code> clause can include optional <code>[[Join (SQL)\|JOIN]]</code> subclauses to specify the rules for joining tables. * The <code>~~[[Where (SQL)\|~~WHERE]]</code> clause includes a comparison predicate, which restricts the rows returned by the query. The <code>WHERE</code> clause eliminates all rows from the result set where the comparison predicate does not evaluate to True. * The <code>GROUP BY</code> clause projects rows having common values into a smaller set of rows. <code>GROUP BY</code> is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The <code>WHERE</code> clause is applied before the <code>GROUP BY</code> clause. * The <code>[[Having (SQL)\|HAVING]]</code> clause includes a predicate used to filter rows resulting from the <code>GROUP BY</code> clause. Because it acts on the results of the <code>GROUP BY</code> clause, aggregation functions can be used in the <code>HAVING</code> clause predicate. * The <code>[[Order by (SQL)\|ORDER BY]]</code> clause identifies which ~~column[s]~~columns to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an <code>ORDER BY</code> clause, the order of rows returned by an SQL query is undefined. * The <code>DISTINCT</code> keyword<ref> {{cite book \| title = SAS 9.4 SQL Procedure User's Guide \| date=10 July 2013 \| url = https://books.google.com/books?id=ESjMAAAAQBAJ \| publisher = SAS Institute \| publication-date = 2013 Line 71 ⟶ 74: <syntaxhighlight lang="sql"> SELECT Book.title AS Title, count() AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; </syntaxhighlight> Line 114 ⟶ 117: === Subqueries === Queries can be nested so that the results of one query can be used in another query via a [[relational operator]] or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases (all depending on implementation), the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function <code>AVG</code> receives as input the result of a subquery: <syntaxhighlight lang="sql"> Line 127 ⟶ 130: A subquery can use values from the outer query, in which case it is known as a [[correlated subquery]]. Since 1999 the SQL standard allows WITH clauses, i.e. named subqueries often called [[common table expression]]s (named and designed after the IBM DB2 version 2 implementation; Oracle calls these [[subquery factoring]]). CTEs can also be [[recursive]] by referring to themselves; [[Hierarchical and recursive queries in SQL\|the resulting mechanism]] allows tree or graph traversals (when represented as relations), and more generally [[fixpoint]] computations. === Derived table === A derived table is ~~the use of referencing an SQL~~a subquery in a FROM clause. Essentially, the derived table is a subquery that can be selected from or joined to. Derived table functionality allows the user to reference the subquery as a table. The derived table also is referred to as an ''inline view'' or a ''select in from list''. In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books): Line 256 ⟶ 259: <code>ROW_NUMBER() OVER</code> may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows: <syntaxhighlight lang="~~sql~~tsql" highlight="3"> SELECT FROM ( SELECT Line 267 ⟶ 270: </syntaxhighlight> {{mono\|ROW_NUMBER}} can be [[Nondeterministic algorithm\|non-deterministic]]: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number. === RANK() window function === The <code>RANK() OVER</code> window function acts like {{mono\|ROW_NUMBER}}, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons: <syntaxhighlight lang="~~sql~~tsql" highlight="3"> SELECT * FROM ( SELECT Line 309 ⟶ 312: \| <syntaxhighlight lang="postgres" highlight="2">SELECT * FROM T LIMIT 10 OFFSET 20</syntaxhighlight> \| [[Netezza]], [[MySQL]], [[MariaDB]] (also supports the standard version, since version 10.6), [[SQL Anywhere\|SAP SQL Anywhere]], [[PostgreSQL]] (also supports the standard, since version 8.4), [[SQLite]], [[HSQLDB]], [[H2 (DBMS)\|H2]], [[Vertica]], [[Polyhedra DBMS\|Polyhedra]], [[Couchbase Server]], [[Snowflake Computing]], [[Virtuoso Universal Server\|OpenLink Virtuoso]] \|- \| <syntaxhighlight lang="sql" highlight="2">SELECT * from T Line 347 ⟶ 350: \| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY</syntaxhighlight> \| [[IBM ~~DB2\|DB2~~Db2]] \|- \| <syntaxhighlight lang="sql" highlight="2">SELECT * FROM T WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY</syntaxhighlight> \| [[IBM ~~DB2\|DB2~~Db2]] (new rows are filtered after comparing with key column of table T) \|} Line 396 ⟶ 399: order by {unique_key}</syntaxhighlight> \| SQL Server 2005 \|- \| <syntaxhighlight lang="mysql">Select * from {table} order by {unique_key} ROWS LIMIT {begin_base_0 + rows}</syntaxhighlight> \| Sybase, ASE 16 SP2 \|- \| Line 446 ⟶ 456: LIMIT {begin_base_0}, {rows}</syntaxhighlight> \| MySQL<br>MariaDB<br>SQLite \|- \| <syntaxhighlight lang="mysql">Select * from {table} order by {unique_key} ROWS LIMIT {rows} OFFSET {begin_base_0}</syntaxhighlight> \| Sybase, ASE 16 SP2 \|- \| Line 649 ⟶ 666: For example, {{sxhl\|2=tsql\| sum(population) OVER( PARTITION BY city ) }} calculates the sum of the populations of all rows having the same ''city'' value as the current row. Partitions are specified using the '''OVER''' clause which modifies the aggregate. Syntax: {{sxhl\|2=bnf\|1= <OVER_CLAUSE> :: = OVER ( [ PARTITION BY <expr>, ... ] [ ORDER BY <expression> ] ) }} The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number. Line 666 ⟶ 683: {{ordered list \|1=<syntaxhighlight lang="~~sql~~postgresql"> select g.* from users u inner join groups g on g.Userid = u.Userid Line 676 ⟶ 693: \|3= the ON clause is evaluated for vtable1; only records which meet the join condition g.Userid = u.Userid are inserted into Vtable2 \|4= If an outer join is specified, records which were dropped from vTable2 are added into VTable 3, for instance if the above query were: <syntaxhighlight lang="~~sql~~postgresql"> select u.* from users u left join groups g on g.Userid = u.Userid Line 684 ⟶ 701: \|5= the WHERE clause is evaluated, in this case only group information for user John Smith would be added to vTable4 \|6= the GROUP BY is evaluated; if the above query were: <syntaxhighlight lang="~~sql~~postgresql"> select g.GroupName, count(g.) as NumberOfMembers from users u inner join groups g on g.Userid = u.Userid Line 691 ⟶ 708: vTable5 would consist of members returned from vTable4 arranged by the grouping, in this case the GroupName \|7= the HAVING clause is evaluated for groups for which the HAVING clause is true and inserted into vTable6. For example: <syntaxhighlight lang="~~sql~~postgresql"> select g.GroupName, count(g.) as NumberOfMembers from users u inner join groups g on g.Userid = u.Userid Line 707 ⟶ 724: == Generating data in T-SQL == Method to generate data based on the union all <syntaxhighlight lang="~~sql~~tsql"> select 1 a, 1 b union all select 1, 2 union all Line 715 ⟶ 732: </syntaxhighlight> SQL Server 2008 supports the "row constructor" feature, specified in the ~~SQL3 ("~~[[SQL:1999")]] standard <syntaxhighlight lang="~~sql~~tsql"> select * from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)