Database is a structured information system, with integrity stored in storage devices, to meet the needs of exploitation and use by many different people or programs. The role and impact of databases on the operations of organizations, agencies and businesses is undeniable. And in each database, most operations are performed based on a certain rule, especially when combined with Structured Query Language (SQL), the ability to manipulate real data extremely easy and convenient. One of the most basic functions in SQL, helping users to join tables to query data from different tables in the total database is the JOIN function.
JOIN is the operation to join data from many tables together (join 2 or more tables together). When you need to query columns of data from different tables to return in the same result set, you need to use JOIN. Two tables can be linked together when there is a common field between these two tables. In case it is necessary to join more than 2 tables, it is necessary to join many tables through multiple JOIN functions based on the same mechanism as when joining 2 tables together.
1. TYPES OF JOIN
There are 5 types of JOIN: CROSS JOIN, INNER JOIN, OUTER JOIN, LEFT JOIN, RIGHT JOIN with different uses in data query operation activities. However, these types of JOINs are easily misunderstood and misused. Therefore, the example below (with data taken from the adventureworks database) will be applied to distinguish the JOIN types.
Customer ID | Last Name |
1 | Gee |
2 | Harris |
3 | Carreras |
29485 | Abel |
29486 | Abercrombie |
29489 | Adams |
Table 1. Name
Customer ID | City |
29485 | Van Nuys |
29486 | Branch |
29489 | Modesto |
Table 2. City
2.1 CROSS JOIN
CROSS JOIN is the simplest form of JOIN. It matches each row from a database table with all other rows. In other words, it gives us the union of each row of the first table with all the records in the second table.
A CROSS JOIN operation returns the Cartesian product of sets of rows from the joined tables.
Example:
In case we want to use CROSS JOIN with the two data tables mentioned at the beginning of the chapter,
SELECT * FROM Name
CROSS JOIN City
Results:
Customer ID | Last Name | City |
1 | Gee | Van Nuys |
1 | Gee | Branch |
1 | Gee | Modesto |
2 | Harris | Van Nuys |
2 | Harris | Branch |
2 | Harris | Modesto |
3 | Carreras | Van Nuys |
3 | Carreras | Branch |
3 | Carreras | Modesto |
29485 | Abel | Van Nuys |
29485 | Abel | Branch |
29485 | Abel | Modesto |
29486 | Abercrombie | Van Nuys |
29486 | Abercrombie | Branch |
29486 | Abercrombie | Modesto |
29489 | Adams | Van Nuys |
29489 | Adams | Branch |
29489 | Adams | Modesto |
We can see that the CROSS JOIN does not find the corresponding records. This JOIN is rarely used, except to generate all possible combinations of records from data tables that do not share a common element.
1.2. INNER JOIN
INNER JOIN is a JOIN command used to combine rows from both tables that satisfy the given condition. Using INNER JOIN is essentially finding the intersection of two data tables. This is the most commonly used JOIN type and is considered the default JOIN operation.
Example:
In case we want to use (INNER) JOIN with the two data tables mentioned at the beginning of the chapter.
SELECT * FROM Name
(INNER) JOIN City
ON Name.CustomerID = City.CustomerID
Results:
Customer ID | Last Name | City |
29485 | Abel | Van Nuys |
29486 | Abercrombie | Branch |
29489 | Adams | Modesto |
Be careful when combining tables on columns that could be NULL because a NULL value will never have a match. When you use INNER JOIN, columns that contain NULL are not included in the result set because they do not match any values. Values that are NULL do not match other NULL values.
For INNER JOIN, the condition written in the ON clause is functionally equivalent to the condition written in the WHERE clause.
1.3 LEFT (OUTER) JOIN
A LEFT JOIN (or LEFT OUTER JOIN) is very different from an INNER JOIN. Instead of restricting the results obtained in both tables, it restricts only the results in the left table (A). That is, if the ON clause has no corresponding record in table B, 1 row in the result is still returned but with a NULL value for each column in table B.
It returns all values from the left table + those corresponding to the right table or null (when the values in the right table do not match).
Example:
In case we want to use LEFT (OUTER) JOIN with the two data tables mentioned at the beginning of the chapter,
SELECT * FROM Name
LEFT JOIN City
ON Name.CustomerID = City.Customer
LEFT JOIN returns all Customer IDs of the left table even if no matching City is found in the right table. The result of the City field for Customer IDs not in the City table (1,2,3) is Null.
1.4. RIGHT (OUTER) JOIN
RIGHT JOIN (or RIGHT OUTER JOIN) is the opposite of LEFT JOIN. RIGHT JOIN is a JOIN type that results in all columns from the right table even if no matching rows are found in the left table. Where no match is found in the table on the left, the result is Null.
To be specific, RIGHT JOIN is almost the same as LEFT JOIN, except that the order of the tables is reversed. Each record from the right table, B will be returned and NULL will be returned for rows that do not have a corresponding record in table A.
Example:
In case we want to use RIGHT (OUTER) JOIN with the two data tables mentioned at the beginning of the chapter
SELECT * FROM City
RIGHT JOIN Name
ON Name.CustomerID = City.CustomerID
Results:
Customer ID | City | Last Name |
29485 | Van Nuys | Abel |
29486 | Branch | Abercrombie |
29489 | Modesto | Adams |
1 | null | Gee |
2 | null | Harris |
3 | null | Carreras |
2.5. FULL (OUTER) JOIN
FULL (OUTER) JOIN combines both the results of the LEFT OUTER JOIN and the RIGHT OUTER JOIN. These types of JOINs take records of both tables, and fill the corresponding missing rows on both sides with NULLs. In other words, it can detect records with no matches in the joined table and return NULL records of the joined table if no match is found.
Some database systems do not support FULL OUTER JOIN, but it can be replaced by using LEFT and RIGHT OUTER JOIN with UNION.
Example:
In case we want to use FULL (OUTER) JOIN with the two data tables mentioned at the beginning of the chapter
SELECT * FROM City
FULL JOIN Name
ON Name.CustomerID = City.CustomerID
SELECT Name.CustomerID
, Name.LastName
, City.City
FROM Name
LEFT JOIN City
ON Name.CustomerID = City.CustomerID
UNION
SELECT Name.CustomerID
, Name.LastName
, City.City
FROM Name
RIGHT JOIN City
ON Name.CustomerID = City.CustomerID
Results:
Customer ID | Last Name | City |
1 | Gee | null |
2 | Harris | null |
3 | Carreras | null |
29485 | Abel | Van Nuys |
29486 | Abercrombie | Branch |
29489 | Adams | Modesto |
Because the Name table already includes all the CustomerIDs of the City table, in this case FULL JOIN returns the same result as LEFT JOIN.
Customer ID | City |
29490 | Lewiston |
Suppose the City table adds information of another CustomerID as:
Results will change to:
Customer ID | Last Name | City |
1 | Gee | null |
2 | Harris | null |
3 | Carreras | null |
29485 | Abel | Van Nuys |
29486 | Abercrombie | Branch |
29489 | Adams | Modesto |
29490 | null | Lewiston |
2.6. OTHER JOINs
2.6.1. NATURAL JOIN
NATURAL JOIN is also a little-known concept when learning about SQL. It is a type of JOIN that is rarely used in SQL statements. However PostgreSQL supports it.
NATURAL JOIN is a variant of INNER JOIN. However, NATURAL JOIN no longer specifies an ON condition to join tables like INNER JOIN. Instead, it will link itself through columns with the same name. Therefore, we should not have the same column names between tables that have no data link between them. If someone adds a new column to one of the tables with the same name as a column in another table, they could break any existing natural links. It is actually a bug waiting to happen.
SELECT Name.CustomerID
, Name.LastName
, City.City
FROM Name
NATURAL JOIN City
2.6.2. DIFFERENCE OF (INNER) JOIN….ON VS (INNER) JOIN….USING
If several columns have the same name but you don’t want to join using all of these common columns, the USING clause is used. The columns listed in the USING clause can not have any qualifiers in the statement, including the WHERE clause.
SELECT * FROM Name
(INNER) JOIN City
USING CustomerID
With CustomerID from Table ‘Name’ is the same with Table ‘City’
When the column names in two tables do not match, the ON clause is used to join the tables. The filter conditions in the WHERE clause are devoid of the JOIN conditions.
Assume that CustomerID column in Table ‘City’ changes to Ctm_ID
SELECT * FROM Name
(INNER) JOIN City
ON Name.CustomerID = City.Ctm_ID
Also, one significant distinction is that the USING clause merges the columns from the joined tables into a single column, whereas the ON clause preserves the columns from each joined table separately. This is crucial, for instance, if you only want to keep rows in your result set if a matching row doesn’t already exist in one of the joined tables.
CONCLUSION
By using JOIN you can get the job done using just one query with any search parameters. On the other hand, MySQL can achieve better performance. Just using a single query instead of running multiple queries can reduce costs. Instead, using multiple queries will result in more data being transferred between MySQL and the applications (software). Moreover, it requires more data manipulation in the application.
Obviously, it is undeniable that we can achieve better SQL and application performance by using the JOIN command.