Module 3: Understanding Inner Join
Previous Tutorial - Next Tutorial
Inner join is the most common type of join, with an Inner join you can match rows from two or more tables based on logical relationships between the tables. The inner join returns only matching rows for which the predicate evaluates to true. Rows for which the predicate evaluates to false or unknown are discarded. This is the general syntax of inner join. As you see that the two tables have been joined with the help of Inner join and ON keywords. Observe that we need to define the predicate after the ON keyword.
The equal sign between each of the columns means that this query returns only rows where there are matching records in each of the tables. This is known as EQUI join, meaning that the values in two tables compared in the join operation must be equal. Although join conditions usually have equality comparison but you can also specify other relational operators such as not equal to operator, greater than or equal to operator or less than or equal to operator, etc.
In the diagram, in the left table the A, B, C represents the primary key values of the table and it matches up with the corresponding rows in the right-hand side table where the same value found, and it produced the result, which returns only matching rows for which this predicate evaluates to true.
When multiple tables are referenced in a single query, all column references must be unambiguous. It means any column name that is duplicated between two or more tables referenced in the query must be qualified with the table name. When a column name is not duplicated in two or more tables used in the query, then references to it do not have to be qualified with the table name. But such a SELECT statement is sometimes difficult to read and understand because there is nothing to indicate which table references which particular column.
The readability of the query is true if all the columns are qualified with their table names. The readability is further true if table aliases are used, especially when the table names themselves must be qualified with the database and all the names. We will see this in the practical part. Let's open the SQL server management studio to see this join in action. In SQL server management studio I have defined the two queries. In the first query, we will fetch the record from the customer table whose person ID is not null. If I execute this query, I will get the 19119 records from this customer table.
My requirement is I want to know what is the first name and last name of this person's ID, but the name of the customer information resides in this person's table. If I execute this query, we will get this first name and last name, middle name, etc.
I need to join the person table with the customer table based on the business entity ID column and person ID column. Let us write the query to join these two tables using the inner join. So first, let me copy this query here(INNER JOIN Person.Person ON person.BusinessEntityID). We join these two tables on the business entity ID column and person ID column. Here note that I have defined the Star, if I execute this query, I will get all the columns. Let me define the individual column to get the required data (SELECT person.BusinessEntityID, customer.PersonID, person.LastName, person.FirstName, Customer.AccountNumber)
This is how you can join these two tables by using Inner join. You can define the same query by using the graphic query editor feature of SQL server management studio. To use that graphic query feature just right-click on it and select this design query in Editor. It will open a window, select your appropriate table. In our case, we need to select the customer table, click on ADD and then you will select the person table, click on ADD and close it.
If you have already defined the primary key and foreign key relationship, then it will automatically connect these two tables. Let us assume that this join is not there, so you can remove this join. Now to connect these two tables, just click this column, select and drag it on the business entity ID column. It will automatically join these two tables and over here you can see by default it will take the Inner join. Then you can select your appropriate column by clicking the checkbox and then click on OK. This graphic query editor will build the query for us by using that GUI. Now if we execute this query, it will give us the same record.
For the same query, you can also use the alias. Let's see how we can define and use the alias for this query. First, let us define the alias for this customer table, and then we will define the alias for this person table. Once you define the alias you need to use the SELECT clause which is(SELECT p.BusinessEntityID, cust.PersonID, p.LastName, p.FirstName, cust.AccountNumber). Let's execute the query and here observe that we use the table alias to prefix even nonambiguous columns such as last name, first name.
This practice is not mandatory as long as the column name is not ambiguous, but it is still considered a best practice for clarity. If I remove this alias still I can get the same result, but we should always have to use the best practice.
You can filter the records further by adding the WHERE clause. If I add the WHERE clause and execute this query, I will get the P records. You can also add this same condition with the join clause. First of all copy this screen. To use that condition in the join clause, I need to define the end logical operator and then I will define the same condition which is (INNER JOIN Person.Person AS p ON p.BusinessEntityID = cust.PersonID AND p.LastName= 'benson'). If I execute both queries together, we will get the same record.
But a very common question that comes to mind is what is the difference between ON and the WHERE clause and does it matter if you specify your predicate in one or the other? The answer is that for Inner join, it doesn't matter if both clauses solve the same filtering purpose. Both filter only rows for which the predicate evaluates to true and discard rows for which the predicate evaluates to false or unknown.
Let's see the execution plan of both these queries. In the execution plan, you can see that both queries will cost 50 percent, and internally Query Optimizer uses the same Hash Match operator to perform the task. But here you need to keep one thing in mind that ON and WHERE play a different role in the query. You need to figure out according to your requirement, which is the appropriate clause for each of your predicates.
So joining the table within Inner join syntax is ANSI standard, but you can also join the table with the WHERE clause. If you work with long-time database professionals who have good knowledge in other products such as Oracle or Informix, then you are likely to encounter an older style syntax of the join which is discouraged in SQL Server.
Let see how you can write this query with the older style join syntax. Copy this query and open it in a new window. To use another table, I need to use a comma and then I need to define another table and then in the where clause I need to define that column which we have used in the inner join which is(WHERE customer.PersonID = person.BusinessEntityID). If I execute the query I will get the same result which I already called using the Inner join syntax.
In this example, both the customer and person tables are referenced in the FROM clause by using the comma, and the join operation is performed in the WHERE clause because it is an older technique.
It is often referred to as legacy join, although it is still partially supported by SQL Server. But this is not the recommended approach because an inner join is the most commonly used type of join. The SQL server decided to make it the default, in case you specify, only the join keyword. If I remove this inner keyword and execute the query, then I will get the same record. That means the inner keyword is optional. Let's move on to the other join, which is outer join.
Web API for developers Free Trial Offline SDK
Here's SQL video tutorial:
Other useful tutorials:
- General SQL Introduction
- Module 1: Getting Started with SQL Server
- Module 1: SQL Server Management
- Module 1: Essential SQL Commands
- Module 1: SQL Introduction Essentials
- Module 2: Learning SQL Overview
- Module 2: Logical Query Processing Order
- Module 2: Select Statement Fundamentals
- Module 2: Filtering Data with WHERE Clause Overview
- Module 2: Filter Data with WHERE Clause - Part 1
- Module 2: How to Filter Data with WHERE Clause - Part 2
- Module 2: Filter PDF Data Using WHERE clause - Part 3
- Module 2: Sorting Data Using ORDER BY Clause
- Module 2: Grouping Data with GROUP BY Clause Part 1
- Module 2: Grouping Data with GROUP BY Clause Part 2
- Module 2: Grouping Data with GROUP BY Clause Part 3
- Module 2: Limiting Data With TOP Clause
- Module 2: Summary and Quiz
- Module 3: SQL Joins and SET Operators
- Module 3: Understanding Inner Join
- Module 3: Understanding Outer Join
- Module 3: Cross Join Explained
- Module 3: Self Join Explained
- Module 3: UNION and UNION ALL
- Module 3: INTERSECT and EXCEPT
- Module 3: Understanding Sub-Queries
- SQL Analytic Functions