Join (SQL)

Una JOIN è un comando SQL che serve a combinare le righe di due o più tabelle di un database. Lo standard ANSI definisce alcune specifiche per il linguaggio SQL sul tipo di JOINda effettuare: INNER, OUTER, LEFT e RIGHT. In alcuni casi è possibile che una tabella possa essere combinata con se stessa, in questo caso si parlerà di self-join.

Tabelle di esempio

Tutti gli esempi che verranno riportati di seguito faranno uso delle seguenti due tabelle

Tabella Impiegati
Cognome	ID_dipartimento
Rossi	31
Bianchi	33
Mancini	33
Santoro	34
Monti	34
Grassi	Template:Null result

Tabella Dipartimenti
ID_dipartimento	Nome_dipartimento
31	Vendite
33	Tecnico
34	Risorse umane
35	Promozione

Dipartimenti.ID_dipartimento è la chiave primaria che ha una cardinalità 1:n nei confronti del campo Impiegati.ID_dipartimento

Nota: Il dipartimento "Promozione" della tabella "Dipartimenti" non ha alcuna corrispondenza nella tabella "Impiegati". Mentre l'impegato "Grassi" non è stato assegnato ad alcun dipartimento (Template:Null result).

Inner join

Una inner join crea un nuova tabella combinando i valori delle due tabelle di partenza (A and B) basandosi su una certa regola di confronto. La query compara ogni riga della tabella A con ciascuna riga della tabella B cercando di soddisfare la regola di confronto definita. Quando la regola di join viene soddisfatta, i valori di tutte le colonne delle tabelle A e B vengono combinate in un'unica riga nella costruzione della tabella risultante. La inner join è la forma di join usata più di frequente nelle applicazioni e rappresenta la modalità di default.

Per esempio, la seguente query unirà le due tabelle 'Impiegati' e 'Dipartimenti' usando la chiave primaria 'ID_dipartimento' che è la colonna presente in entrambe le tabelle. Quando ID_dipartimento corrisponde in entrambe le tabelle (ovvero la regola è soddisfatta), la query combinerà le colonne Cognome, ID_dipartimento e Nome_dipartimento in un'unica riga per la tabella risultante.

Esempio di inner join in forma esplicita:

SELECT *
FROM   Impiegati 
       INNER JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

che equivale a:

SELECT *  
FROM   Impiegati, Dipartimenti 
WHERE  Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

Risultato della query di Inner join:

Impiegati.Cognome	Impiegati.ID_dipartimento	Dipartimenti.Nome_dipartimento	Dipartimenti.ID_dipartimento
Santoro	34	Risorse umane	34
Bianchi	33	Tecnico	33
Monti	34	Risorse umane	34
Mancini	33	Tecnico	33
Rossi	31	Vendite	31

Nota l'impiegato "Grassi" e il dipartimento "Promozione" non sono presenti in quanto l'impiegato Grassi ha un Template:Null result mentre Promozione non compare in nessun impiegato. A volte come risultato finale si desidera avere anche i campi che non hanno corrispondenza: in tal caso è possibile usare la query di tipo Outer join.

Equi-join

LA equi-join, altrimenti conosciuta come equijoin, ha un particolare tipo di comparatore, detto theta join, che utilizza come metodo di verifica, solamente l'uguaglianza matematica come regola di confronto. Usando altri operatori di confronto (come ad esempio <) squalifica la join come equi-join. La query riportata di seguito utilizza una equi-join:

SELECT *
FROM   Impiegati 
       INNER JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

SQL fornisce una scorciatoia per definire le equi-joins, attraverso la keyword USING

SELECT *
FROM   Impiegati 
       INNER JOIN Dipartimenti 
          USING (ID_dipartimento)

La keyword USING è supportata da MySQL, Oracle, PostgreSQL, SQLite, and DB2/400.

Natural join

Una natural join offre ulteriori specializzazioni di equi-joins. Solitamente la join confronta colonne di tabelle diverse che hanno lo stesso nome. La natural join fa proprio questo.

Nell'esempio che segue viene riportata la notazione natural join equivalente alla prima inner-join definita:

SELECT *
FROM   Impiegati NATURAL JOIN Dipartimenti

ID_dipartimento	Impiegati.Cognome	Dipartimenti.Nome_dipartimento
34	Monti	Risorse umane
33	Bianchi	Tecnico
34	Santoro	Risorse umane
33	Mancini	Tecnico
31	Rossi	Vendite

Cross join

Una cross join, cartesian join or product fornisce le basi (le infrastrutture informatiche) attraverso cui tutti i tipi di inner join operano. Il risultato di una cross join è il prodotto cartesiano di tutte le righe delle tabelle che concorrono alla query di join. È come dire che stiamo facendo una inner join senza impostare la regola di confronto o in cui la regola di confronto ritorna sempre vero.

Se A e B le tabelle di partenza, la cross join si scrive A × B.

Esempio di cross join esplicito:

SELECT *
FROM   Impiegati CROSS JOIN Dipartimenti

Altro esempio implicito di cross join:

SELECT *
FROM   Impiegati, Dipartimenti;

Impiegati.Cognome	Impiegati.ID_dipartimento	Dipartimenti.Nome_dipartimento	Dipartimenti.ID_dipartimento
Rossi	31	Vendite	31
Bianchi	33	Vendite	31
Mancini	33	Vendite	31
Monti	34	Vendite	31
Santoro	34	Vendite	31
Grassi	Template:Null result	Vendite	31
Rossi	31	Tecnico	33
Bianchi	33	Tecnico	33
Mancini	33	Tecnico	33
Monti	34	Tecnico	33
Santoro	34	Tecnico	33
Grassi	Template:Null result	Tecnico	33
Rossi	31	Risorse umane	34
Bianchi	33	Risorse umane	34
Mancini	33	Risorse umane	34
Monti	34	Risorse umane	34
Santoro	34	Risorse umane	34
Grassi	Template:Null result	Risorse umane	34
Rossi	31	Promozione	35
Bianchi	33	Promozione	35
Mancini	33	Promozione	35
Monti	34	Promozione	35
Santoro	34	Promozione	35
Grassi	Template:Null result	Promozione	35

Outer joins

Una outer join non richiede che ci sia corrispondenza esatta tra le righe di due tabelle. La tabella risultante da una outer join trattiene tutti quei record che non hanno alcuna corrispondenza tra le tabelle. Le outer joins si suddividono in left outer joins, right outer joins, e full outer joins, in base a quale sia la tabella di cui intendiamo trattenere i valori in caso di mancata corrispondenza della regola di confronto da (sinistra, destra, o entrambi).

(In questo caso left (sinistra) e right (destra) si riferiscono ai due lati della keyword JOIN.)

Left outer join

Il risultato di una query left outer join (o semplicemente left join) per le tabelle A e B contiene sempre tutti i record della tabella di sinistra ("left") A, mentre vengono estratti dalla tabella di destra ("right") B solamente le righe che trovano corrispondenza nella regola di confronto della join. Questo significa che se la clausola ON trova 0 (zero) righe in B, la join mostrerà una riga risultante con valore NULL in tutte le colonne corrispondenti al risultato per la colonna B.

Esempio di una left outer join:

SELECT *  
FROM   Impiegati  LEFT OUTER JOIN Dipartimenti  
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

Impiegati.Cognome	Impiegati.ID_dipartimento	Dipartimenti.Nome_dipartimento	Dipartimenti.ID_dipartimento
Bianchi	33	Tecnico	33
Rossi	31	Vendite	31
Santoro	34	Risorse umane	34
Monti	34	Risorse umane	34
Grassi	Template:Null result	Template:Null result	Template:Null result
Mancini	33	Tecnico	33

Right outer joins

Una right outer join (o right join) semplicemente ricalca il funzionamento della left outer join, ma invertendo l'ordine delle tabelle interessate. Every row from the "right" table (B) will appear in the joined table at least once. If no matching row from the "left" table (A) exists, NULL will appear in columns from A for those records that have no match in A.

A right outer join returns all the values from the right table and matched values from the left table (NULL in case of no matching join predicate).

For example, this allows us to find each Impiegati and his or her Dipartimenti, but still show Dipartimentis that have no Impiegatis.

Example right outer join, with the additional result row italicized:

SELECT * 
FROM   Impiegati RIGHT OUTER JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

Impiegati.Cognome	Impiegati.ID_dipartimento	Dipartimenti.Nome_dipartimento	Dipartimenti.ID_dipartimento
Monti	34	Risorse umane	34
Bianchi	33	Tecnico	33
Santoro	34	Risorse umane	34
Mancini	33	Tecnico	33
Rossi	31	Vendite	31
Template:Null result	Template:Null result	Promozione	35

In practice, explicit right outer joins are rarely used, since they can always be replaced with left outer joins (with the table order switched) and provide no additional functionality. The result above is produced also with a left outer join:

SELECT * 
FROM   Dipartimenti LEFT OUTER JOIN Impiegati
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

Full outer join

A full outer join combines the results of both left and right outer joins. The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side.

For example, this allows us to see each Impiegati who is in a Dipartimenti and each Dipartimenti that has an Impiegati, but also see each Impiegati who is not part of a Dipartimenti and each Dipartimenti which doesn't have an Impiegati.

Example full outer join:

SELECT *  
FROM   Impiegati 
       FULL OUTER JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

Impiegati.Cognome	Impiegati.ID_dipartimento	Dipartimenti.Nome_dipartimento	Dipartimenti.ID_dipartimento
Monti	34	Risorse umane	34
Bianchi	33	Tecnico	33
Santoro	34	Risorse umane	34
Grassi	Template:Null result	Template:Null result	Template:Null result
Mancini	33	Tecnico	33
Rossi	31	Vendite	31
Template:Null result	Template:Null result	Promozione	35

Some database systems (like MySQL) do not support this functionality directly, but they can emulate it through the use of left and right outer joins and unions. The same example can appear as follows:

SELECT *
FROM   Impiegati 
       LEFT JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento
UNION
SELECT *
FROM   Impiegati
       RIGHT JOIN Dipartimenti
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento
WHERE  Impiegati.ID_dipartimento IS NULL

SQLite does not support right join, so outer join can be emulated as follows:

SELECT Impiegati.*, Dipartimenti.*
FROM   Impiegati 
       LEFT JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento
UNION
SELECT Impiegati.*, Dipartimenti.*
FROM   Dipartimenti
       LEFT JOIN Impiegati
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento
WHERE  Impiegati.ID_dipartimento IS NULL

Self-join

A self-join is joining a table to itself.^[1] This is best illustrated by the following example.

Example

A query to find all pairings of two Impiegatis in the same country is desired. If you had two separate tables for Impiegatis and a query which requested Impiegatis in the first table having the same country as Impiegatis in the second table, you could use a normal join operation to find the answer table. However, all the Impiegati information is contained within a single large table. ^[2]

Considering a modified Impiegati table such as the following:

Impiegati Table
ImpiegatiID	Cognome	Country	ID_dipartimento
123	Rossi	Australia	31
124	Bianchi	Australia	33
145	Mancini	Australia	33
201	Santoro	United States	34
305	Monti	United Kingdom	34
306	Grassi	United Kingdom	Template:Null result

An example solution query could be as follows:

SELECT F.ImpiegatiID, F.Cognome  , S.ImpiegatiID, S.Cognome  , F.Country
FROM Impiegati F, Impiegati S
WHERE F.Country = S.Country
AND F.ImpiegatiID < S.ImpiegatiID
ORDER BY F.ImpiegatiID, S.ImpiegatiID;

Which results in the following table being generated.

Impiegati Table after Self-join by Country
ImpiegatiID	Cognome	ImpiegatiID	Cognome	Country
123	Rossi	124	Bianchi	Australia
123	Rossi	145	Mancini	Australia
124	Bianchi	145	Mancini	Australia
305	Monti	306	Grassi	United Kingdom

For this example, note that:

F and S are aliases for the first and second copies of the Impiegati table.
The condition F.Country = S.Country excludes pairings between Impiegatis in different countries. The example question only wanted pairs of Impiegatis in the same country.
The condition F.ImpiegatiID < S.ImpiegatiID excludes pairings where the ImpiegatiIDs are the same.
F.ImpiegatiID < S.ImpiegatiID also excludes duplicate pairings. Without it only the following less useful part of the table would be generated (for the United Kingdom only shown):

ImpiegatiID	Cognome	ImpiegatiID	Cognome	Country
305	Monti	305	Monti	United Kingdom
305	Monti	306	Grassi	United Kingdom
306	Grassi	305	Monti	United Kingdom
306	Grassi	306	Grassi	United Kingdom

Only one of the two middle pairings is needed to satisfy the original question, and the topmost and bottommost are of no interest at all in this example.

Alternatives

The effect of outer joins can also be obtained using correlated subqueries. For example

SELECT Impiegati.Cognome  , Impiegati.ID_dipartimento, Dipartimenti.Nome_dipartimento 
FROM   Impiegati LEFT OUTER JOIN Dipartimenti 
          ON Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento

can also be written as

SELECT Impiegati.Cognome , Impiegati.ID_dipartimento,
  (SELECT Dipartimenti.Nome_dipartimento 
    FROM Dipartimenti
   WHERE Impiegati.ID_dipartimento = Dipartimenti.ID_dipartimento )
FROM   Impiegati

Implementation

Much work in database-systems has aimed at efficient implementation of joins, because relational systems commonly call for joins, yet face difficulties in optimising their efficient execution. The problem arises because (inner) joins operate both commutatively and associatively. In practice, this means that the user merely supplies the list of tables for joining and the join conditions to use, and the database system has the task of determining the most efficient way to perform the operation. A query optimizer determines how to execute a query containing joins. A query optimizer has two basic freedoms:

Join order: Because joins function commutatively and associatively, the order in which the system joins tables does not change the final result-set of the query. However, join-order does have an enormous impact on the cost of the join operation, so choosing the best join order becomes very important.
Join method: Given two tables and a join condition, multiple algorithms can produce the result-set of the join. Which algorithm runs most efficiently depends on the sizes of the input tables, the number of rows from each table that match the join condition, and the operations required by the rest of the query.

Many join-algorithms treat their inputs differently. One can refer to the inputs to a join as the "outer" and "inner" join operands, or "left" and "right", respectively. In the case of nested loops, for example, the database system will scan the entire inner relation for each row of the outer relation.

One can classify query-plans involving joins as follows:^[3]

left-deep: using a base table (rather than another join) as the inner operand of each join in the plan
right-deep: using a base table as the outer operand of each join in the plan
bushy: neither left-deep nor right-deep; both inputs to a join may themselves result from joins

These names derive from the appearance of the query plan if drawn as a tree, with the outer join relation on the left and the inner relation on the right (as convention dictates).

Join algorithms

Three fundamental algorithms exist for performing a join operation.

Nested loops

Template:Main2

Use of nested loops produces the simplest join-algorithm. For each tuple in the outer join relation, the system scans the entire inner-join relation and appends any tuples that match the join-condition to the result set. Naturally, this algorithm performs poorly with large join-relations: inner or outer or both. An index on columns in the inner relation in the join-predicate can enhance performance.

The block nested loops (BNL) approach offers a refinement to this technique: for every block in the outer relation, the system scans the entire inner relation. For each match between the current inner tuple and one of the tuples in the current block of the outer relation, the system adds a tuple to the join result-set. This variant means doing more computation for each tuple of the inner relation, but far fewer scans of the inner relation.

Merge join

If both join relations come in order, sorted by the join attribute(s), the system can perform the join trivially, thus:

Consider the current "group" of tuples from the inner relation; a group consists of a set of contiguous tuples in the inner relation with the same value in the join attribute.
For each matching tuple in the current inner group, add a tuple to the join result. Once the inner group has been exhausted, advance both the inner and outer scans to the next group.

Merge joins offer one reason why many optimizers keep track of the sort order produced by query plan operators—if one or both input relations to a merge join arrives already sorted on the join attribute, the system need not perform an additional sort. Otherwise, the DBMS will need to perform the sort, usually using an external sort to avoid consuming too much memory.

Template:Seealso

Hash join

A hash join algorithm can only produce equi-joins. The database system pre-forms access to the tables concerned by building hash tables on the join-attributes. The lookup in hash tables operates much faster than through index trees. However, one can compare hashed values only for equality, not for other relationships.

Notes

^ Shah, p. 165
^ Adapted from Pratt, pp. 115–6
^ Yu e Meng, p. 213

References

Phillip J Pratt, A Guide To SQL, Seventh Edition, Thomson Course Technology, 2005.
Nilesh Shah, Database Systems Using Oracle - A Simplified Guide to SQL and PL/SQL Second Edition, Pearson Education International, 2005.
Clement T. Yu, Principles of Database Query Processing for Advanced Applications, Morgan Kaufmann, 1998. URL consultato il 3 marzo 2009.

External links

Template:Databases

[1] Shah, p. 165

[2] Adapted from Pratt, pp. 115–6

[Yu1998-3] Yu e Meng, p. 213

[1]

[2]

[3]