LEFT JOIN versus INNER JOIN

**Pradeep1210** · Dec 28th, 2009, 09:16 AM

In situations where both would produce same result (e.g. foreign keys on not-nullable fields etc.), which one should be preferred performance wise - INNER JOIN or LEFT/RIGHT JOIN?

**abhijit** · Dec 28th, 2009, 01:10 PM

Interesting question. I suppose a OUTER join would slow things compared to a INNER JOIN.

**Pradeep1210** · Dec 28th, 2009, 02:10 PM

Not sure, but I think inner join should be slower because it has additional overhead of removing unmatched rows, which left/right outer joins don't have.

**GaryMazzone** · Dec 28th, 2009, 03:13 PM

I believe as abhijit does. The outer join performs slower the the inner join.

**Pradeep1210** · Dec 28th, 2009, 06:25 PM

I tried to compare the two types like this:

1. Created two tables:

sql Code:

USE [tempdb]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Table1](
    [Column1] [int] NOT NULL,
 CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED 
(
    [Column1] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
 
CREATE TABLE [dbo].[Table2](
    [Column1] [int] NOT NULL
) ON [PRIMARY]
GO

2. Inserted some random records in both tables. Table1 is filled with numbers from 1 to 1000000, and table2 has random records selected from table1.

sql Code:

truncate table table1
truncate table table2
 
declare @i int
set @i = 1
while @i < 1000000
begin
    insert into table1 values(@i)
    set @i =@i + 1
end
select count(*) from table1
go
 
insert into table2 select column1 from table1 where column1%3=0 or column1%7=0
select count(*) from table2
go
 
insert into table2 select column1 from table1 where column1%3=0 or column1%7=0
select count(*) from table2
go
 
insert into table2 select column1 from table1 where column1%5=0 or column1%11=0
select count(*) from table2
go
 
insert into table2 select column1 from table1 where column1%4=2
select count(*) from table2
go

3. Created a stored procedure to benchmark results:

sql Code:

drop procedure BenchMarkPerformance 
GO
 
Create Procedure BenchMarkPerformance
AS
SET NOCOUNT ON
DECLARE @Query1 nvarchar(2048)
DECLARE @Query2 nvarchar(2048)
 
DECLARE @started datetime
DECLARE @SampleSize bigint
DECLARE @NumTime varchar(100)
DECLARE @StrTime varchar(100)
DECLARE @counter int
 
set @Query1 = 'select table1.column1 from table2 left  join table1 on table2.column1 = table1.column1'
set @Query2 = 'select table1.column1 from table2 inner join table1 on table2.column1 = table1.column1'
set @counter = 0;
 
-- ********** SAMPLE SIZE **********
select @Samplesize = count(*) from table2;
Print 'Sample Size : table2 = ' + cast(@SampleSize as varchar);
select @Samplesize = count(*) from table1;
Print 'Sample Size : table1 = ' + cast(@SampleSize as varchar);
--***********************************
 
while @counter < 10
begin
    set @counter = @counter + 1;
    dbcc freeproccache WITH NO_INFOMSGS ;
    dbcc dropcleanbuffers WITH NO_INFOMSGS ;
 
    -- ********** STRING SEARCH ********
    set @started = getdate(); 
    EXECUTE sp_executesql @Query1;
    set @StrTime = '    LEFT  JOIN: ' + CONVERT(VARCHAR(26), getdate() - @started, 114);
    --***********************************
 
    -- ********** NUMERIC SEARCH ********
    set @started = getdate(); 
    EXECUTE sp_executesql @Query2;
    set @NumTime = '    INNER JOIN: ' + CONVERT(VARCHAR(26), getdate() - @started, 114);
    --***********************************
 
    print Cast(@counter as varchar) +  @StrTime + @NumTime;
    waitfor delay '00:00:01' 
end

4. Finally executed this stored procedure.

sql Code:

exec BenchMarkPerformance

5. Results:

Code:

Sample Size : table2 = 1379869
Sample Size : table1 = 999999
1    LEFT  JOIN: 00:00:14:643    INNER JOIN: 00:00:13:483
2    LEFT  JOIN: 00:00:14:280    INNER JOIN: 00:00:13:487
3    LEFT  JOIN: 00:00:13:560    INNER JOIN: 00:00:14:017
4    LEFT  JOIN: 00:00:13:330    INNER JOIN: 00:00:13:467
5    LEFT  JOIN: 00:00:13:453    INNER JOIN: 00:00:14:453
6    LEFT  JOIN: 00:00:13:390    INNER JOIN: 00:00:13:907
7    LEFT  JOIN: 00:00:13:407    INNER JOIN: 00:00:13:500
8    LEFT  JOIN: 00:00:13:403    INNER JOIN: 00:00:13:470
9    LEFT  JOIN: 00:00:14:390    INNER JOIN: 00:00:13:547
10    LEFT  JOIN: 00:00:13:423    INNER JOIN: 00:00:13:467

So from above it seems like there is no difference in performance of the two and the whole exercise was a waste.

**techgnome** · Dec 28th, 2009, 06:58 PM

In general, INNER JOINS are faster, as they can eliminate results as soon as a mismatch is found based on the join. At least that's been my experience.

To answer the original question though, if you want to ALWAYS get the same results, then I'd go with the INNER JOIN .... BUT if you think you might want those mismatched records when they do occur, then go with the LEFT... the query should be based on the desired results... and users being an interesting lot, will find a way to hoark up the data and produce a mismatched result when you may (left join) or may not (inner join) want it.

-tg

**leinad31** · Dec 28th, 2009, 07:25 PM

Syntax of query influences tuning but in the end it is the query execution plan that determines performance.

Your sample/test queries are too simple and return the same result. Think of it this way, in a 5-table join if you intend to get a 2% subset then you are better off eliminating as much of the data at the start of query execution plan rather than initially doing LEFT joins with the INNER join and WHERE criteria being applied to the end. You will only waste effort/resources (CPU cycles, temp storage, memory, etc) on comparisons/data that will be thrown away or not included in the final result so do the INNER join as early as possible.

Same goes if it was a manual paper process.... if you were asked to retrieve related documents from 5 piles then you will eliminate as much of them first (best if they are sorted or indexed) or extract only relevant documents from the piles. And finally you staple together the remaining documents that are relevant to your task; least amount of effort required on your part. You will avoid checking each and every document as much as possible.

But it also depends on how data is to be returned, or the columns to be returned.... sometimes you have no choice but to do a LEFT join to show that data from right-hand side of relationship does not exist (columns from right hand side needed) or for aggregation purposes (more common in data warehouse).

**abhijit** · Dec 29th, 2009, 08:45 AM

Pradeep,
I think your exercise should have checked the execution plan instead of the timing of the query results.

The execution plan will give you a more accurate picture of which operation will result in maximum cost.

**Pradeep1210** · Dec 29th, 2009, 10:01 AM

@tg, leinad
Yes, inner joins are definitely faster. But there comes many situations in our programs where the backend database supports null fields but is not allowed from the front end application. So in such cases, a minor bug may also cause a null value to be inserted there causing inner join to eleminate the rows in the output result where as the outer join seems like a safe bet. But I'm looking at performance implications in such cases.

@abhijit
I had compared the execution plans but forgot to mention there. The execution plan for both cases is exactly same. They start varying only when the SELECT clause is altered. So it seems like sql server does some smart manipulations and finds out that both would produce exactly same results. This is also evident from the result of experiment. They match neck to neck.

**techgnome** · Dec 29th, 2009, 11:25 AM

well, if the inner join eliminates data that you require, then the inner join isn't an option. If that's the case, then you need the outer join. Which is what I was trying to point out (and I'll admit that I'm not entirely sure that's how it came out.)

-tg

**leinad31** · Jan 3rd, 2010, 07:16 PM

Originally Posted by Pradeep1210

@tg, leinad
Yes, inner joins are definitely faster. But there comes many situations in our programs where the backend database supports null fields but is not allowed from the front end application. So in such cases, a minor bug may also cause a null value to be inserted there causing inner join to eleminate the rows in the output result where as the outer join seems like a safe bet. But I'm looking at performance implications in such cases.

@abhijit
I had compared the execution plans but forgot to mention there. The execution plan for both cases is exactly same. They start varying only when the SELECT clause is altered. So it seems like sql server does some smart manipulations and finds out that both would produce exactly same results. This is also evident from the result of experiment. They match neck to neck.

Yes, that is often the case/problem with legacy databases... ideal solution would be to correct the data architecture but no one is willing to risk their career and touch the database due to undocumented dependencies

It seems you have no choice but to address the shortcomings of the database design via SQL and accept the performance penalties whatever they may be.

Thread: LEFT JOIN versus INNER JOIN

Thread Tools

Display

LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Re: LEFT JOIN versus INNER JOIN

Tags for this Thread

Posting Permissions