The SQL Join
There are three categories of joins that you can make use of in SQL:
- Inner Join
- Outer Join
- Cross Join
But before we dive into the categories of joins, we first need to have an understanding of what a join really is.
Joins are used in SQL to bring together all the relevant data from multiple database tables. Remember that we've broken data down into multiple tables and established relationships between the tables.
An example we've used in the past was the tables author and book. We are going to switch things up in this post and change the relationship of the author and book tables to be a one-to-many relationship (as opposed to the previous many-to-many relationship)
So what does the relationship between author and book look like as a one-to-many relationship?
Author |
---|
<<PK>> author_id [int (11)] |
first_name [varchar(20)] |
last_name [varchar(20)] |
Book |
---|
<<PK>> book_id [int (11)] |
<<FK>> author_id [int (11)] |
book_title [varchar(255)] |
ISBN [varchar(255)] |
version [varchar(10)] |
The relationship between these two tables is based on the author_id column. So how does this relate to the topic of joins?
Let's say we're in a situation where we want to ask the question “Which books did which author create?” We answer this question with a join. When we join the tables together, we marry the data together (smash it together) based on the existing relationship the tables have with each other. If there is no relationship, then we can't join the tables… make sense?
SQL Inner Join
So, the first type of join I want to talk about is one of the most commonly used joins in SQL, and that's the inner join. The inner join matches data from both tables with each other, and if there isn't a match, then there is no data returned. This is an important concept to understand, as it's what separates the inner join from the other types. So I'll say it again, the SQL inner join will not return a row if there is no match from one table to the other.
Here's an example that should illustrate this concept nicely:
Author
Author_id | First_Name | Last_Name |
---|---|---|
1 | Trevor | Page |
2 | Jane | Doe |
3 | Jack | Johnson |
Book
Book_id | Author_id | Book_title |
---|---|---|
1 | 1 | How to Program with Java 2nd Edition |
2 | 1 | How to Program with Java 1st Edition |
3 | 3 | How to Play Guitar |
Let's assume that the data above is what's stored in our database (in the author
and book
tables). What would happen if we invoked an inner join like so:
select * from author inner join book on author.author_id = book.author_id;
The results of this query will follow the rules that I outlined above… if there is no match (based on the relationship) then it won't return any results. So because of this fact, there will only be three rows returned (instead of 4) because the author
“Jane Doe” doesn't have a corresponding book
.
Let's see what we get when we run this sql script:
Author_id | First_Name | Last_Name | Book_id | Book_title |
---|---|---|---|---|
1 | Trevor | Page | 1 | How to Program with Java 2nd Edition |
1 | Trevor | Page | 2 | How to Program with Java 1st Edition |
3 | Jack | Johnson | 3 | How to Play Guitar |
So as you can see, since we've used an inner join, author Jane Doe
doesn't even show up as a result because she doesn't have a matching row in the book
table.
Make sense?
SQL Outer Join
Okay, so now let's move onto the second most used join type, the SQL outer join.
Let's say that we are in a situation where we want to see all the rows from the author table as well as any pertinent “joined” data from the book
table. How can we accomplish that?
Outer joins of course!
SQL outer joins actually break down into three categories:
- Left Outer Join
- Right Outer Join
- Full Outer Join
The most commonly used of these three (by far) is the left outer join, so that's the one that we will be focusing our attention on.
As I mentioned above, the left outer join is useful when you don't want to exclude any rows from one table if there doesn't happen to be any data to join to. So let's say we want to see all the authors (including the authors that haven't yet published a book), and we also want to see the book titles that belong to those authors who HAVE published a book. This requirement demands the use of a left join. Here's what it would look like:
select * from author left join book on author.author_id = book.author_id;
Here's what the results of this query would be:
Author_id | First_Name | Last_Name | Book_id | Book_title |
---|---|---|---|---|
1 | Trevor | Page | 1 | How to Program with Java 2nd Edition |
1 | Trevor | Page | 2 | How to Program with Java 1st Edition |
2 | Jane | Doe | null | null |
3 | Jack | Johnson | 3 | How to Play Guitar |
Can you spot the difference in the results that were returned from the left join vs the inner join?
The difference is that with the left join we actually get the Jane Doe
row to display… however, you'll notice that instead of displaying information about Jane's book, you see null
. This is because Jane doesn't have any matching rows in the book
table, so there's nothing for SQL to display in these book
columns.
So to sum up, the when there's no matching data, the left join will display null
s but still return a result, whereas the inner join will exclude the entire row altogether.