|
-
Dec 28th, 2009, 09:16 AM
#1
LEFT JOIN versus INNER JOIN
In situations where both would produce same result (e.g. foreign keys on not-nullable fields etc.), which one should be preferred performance wise - INNER JOIN or LEFT/RIGHT JOIN?
-
Dec 28th, 2009, 01:10 PM
#2
Re: LEFT JOIN versus INNER JOIN
Interesting question. I suppose a OUTER join would slow things compared to a INNER JOIN.
Everything that has a computer in will fail. Everything in your life, from a watch to a car to, you know, a radio, to an iPhone, it will fail if it has a computer in it. They should kill the people who made those things.- 'Woz'
save a blobFileStreamDataTable To Text Filemy blog
-
Dec 28th, 2009, 02:10 PM
#3
Re: LEFT JOIN versus INNER JOIN
Not sure, but I think inner join should be slower because it has additional overhead of removing unmatched rows, which left/right outer joins don't have.
-
Dec 28th, 2009, 03:13 PM
#4
Re: LEFT JOIN versus INNER JOIN
I believe as abhijit does. The outer join performs slower the the inner join.
Sometimes the Programmer
Sometimes the DBA
Mazz1
-
Dec 28th, 2009, 06:25 PM
#5
Re: LEFT JOIN versus INNER JOIN
I tried to compare the two types like this:
1. Created two tables:
sql Code:
USE [tempdb] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Table1]( [Column1] [int] NOT NULL, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ( [Column1] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[Table2]( [Column1] [int] NOT NULL ) ON [PRIMARY] GO
2. Inserted some random records in both tables. Table1 is filled with numbers from 1 to 1000000, and table2 has random records selected from table1.
sql Code:
truncate table table1 truncate table table2 declare @i int set @i = 1 while @i < 1000000 begin insert into table1 values(@i) set @i =@i + 1 end select count(*) from table1 go insert into table2 select column1 from table1 where column1%3=0 or column1%7=0 select count(*) from table2 go insert into table2 select column1 from table1 where column1%3=0 or column1%7=0 select count(*) from table2 go insert into table2 select column1 from table1 where column1%5=0 or column1%11=0 select count(*) from table2 go insert into table2 select column1 from table1 where column1%4=2 select count(*) from table2 go
3. Created a stored procedure to benchmark results:
sql Code:
drop procedure BenchMarkPerformance GO Create Procedure BenchMarkPerformance AS SET NOCOUNT ON DECLARE @Query1 nvarchar(2048) DECLARE @Query2 nvarchar(2048) DECLARE @started datetime DECLARE @SampleSize bigint DECLARE @NumTime varchar(100) DECLARE @StrTime varchar(100) DECLARE @counter int set @Query1 = 'select table1.column1 from table2 left join table1 on table2.column1 = table1.column1' set @Query2 = 'select table1.column1 from table2 inner join table1 on table2.column1 = table1.column1' set @counter = 0; -- ********** SAMPLE SIZE ********** select @Samplesize = count(*) from table2; Print 'Sample Size : table2 = ' + cast(@SampleSize as varchar); select @Samplesize = count(*) from table1; Print 'Sample Size : table1 = ' + cast(@SampleSize as varchar); --*********************************** while @counter < 10 begin set @counter = @counter + 1; dbcc freeproccache WITH NO_INFOMSGS ; dbcc dropcleanbuffers WITH NO_INFOMSGS ; -- ********** STRING SEARCH ******** set @started = getdate(); EXECUTE sp_executesql @Query1; set @StrTime = ' LEFT JOIN: ' + CONVERT(VARCHAR(26), getdate() - @started, 114); --*********************************** -- ********** NUMERIC SEARCH ******** set @started = getdate(); EXECUTE sp_executesql @Query2; set @NumTime = ' INNER JOIN: ' + CONVERT(VARCHAR(26), getdate() - @started, 114); --*********************************** print Cast(@counter as varchar) + @StrTime + @NumTime; waitfor delay '00:00:01' end
4. Finally executed this stored procedure.
sql Code:
exec BenchMarkPerformance
5. Results:
Code:
Sample Size : table2 = 1379869
Sample Size : table1 = 999999
1 LEFT JOIN: 00:00:14:643 INNER JOIN: 00:00:13:483
2 LEFT JOIN: 00:00:14:280 INNER JOIN: 00:00:13:487
3 LEFT JOIN: 00:00:13:560 INNER JOIN: 00:00:14:017
4 LEFT JOIN: 00:00:13:330 INNER JOIN: 00:00:13:467
5 LEFT JOIN: 00:00:13:453 INNER JOIN: 00:00:14:453
6 LEFT JOIN: 00:00:13:390 INNER JOIN: 00:00:13:907
7 LEFT JOIN: 00:00:13:407 INNER JOIN: 00:00:13:500
8 LEFT JOIN: 00:00:13:403 INNER JOIN: 00:00:13:470
9 LEFT JOIN: 00:00:14:390 INNER JOIN: 00:00:13:547
10 LEFT JOIN: 00:00:13:423 INNER JOIN: 00:00:13:467
So from above it seems like there is no difference in performance of the two and the whole exercise was a waste.
-
Dec 28th, 2009, 06:58 PM
#6
Re: LEFT JOIN versus INNER JOIN
In general, INNER JOINS are faster, as they can eliminate results as soon as a mismatch is found based on the join. At least that's been my experience.
To answer the original question though, if you want to ALWAYS get the same results, then I'd go with the INNER JOIN .... BUT if you think you might want those mismatched records when they do occur, then go with the LEFT... the query should be based on the desired results... and users being an interesting lot, will find a way to hoark up the data and produce a mismatched result when you may (left join) or may not (inner join) want it.
-tg
-
Dec 28th, 2009, 07:25 PM
#7
Re: LEFT JOIN versus INNER JOIN
Syntax of query influences tuning but in the end it is the query execution plan that determines performance.
Your sample/test queries are too simple and return the same result. Think of it this way, in a 5-table join if you intend to get a 2% subset then you are better off eliminating as much of the data at the start of query execution plan rather than initially doing LEFT joins with the INNER join and WHERE criteria being applied to the end. You will only waste effort/resources (CPU cycles, temp storage, memory, etc) on comparisons/data that will be thrown away or not included in the final result so do the INNER join as early as possible.
Same goes if it was a manual paper process.... if you were asked to retrieve related documents from 5 piles then you will eliminate as much of them first (best if they are sorted or indexed) or extract only relevant documents from the piles. And finally you staple together the remaining documents that are relevant to your task; least amount of effort required on your part. You will avoid checking each and every document as much as possible.
But it also depends on how data is to be returned, or the columns to be returned.... sometimes you have no choice but to do a LEFT join to show that data from right-hand side of relationship does not exist (columns from right hand side needed) or for aggregation purposes (more common in data warehouse).
Last edited by leinad31; Dec 28th, 2009 at 07:28 PM.
-
Dec 29th, 2009, 08:45 AM
#8
Re: LEFT JOIN versus INNER JOIN
Pradeep,
I think your exercise should have checked the execution plan instead of the timing of the query results.
The execution plan will give you a more accurate picture of which operation will result in maximum cost.
Everything that has a computer in will fail. Everything in your life, from a watch to a car to, you know, a radio, to an iPhone, it will fail if it has a computer in it. They should kill the people who made those things.- 'Woz'
save a blobFileStreamDataTable To Text Filemy blog
-
Dec 29th, 2009, 10:01 AM
#9
Re: LEFT JOIN versus INNER JOIN
@tg, leinad
Yes, inner joins are definitely faster. But there comes many situations in our programs where the backend database supports null fields but is not allowed from the front end application. So in such cases, a minor bug may also cause a null value to be inserted there causing inner join to eleminate the rows in the output result where as the outer join seems like a safe bet. But I'm looking at performance implications in such cases.
@abhijit
I had compared the execution plans but forgot to mention there. The execution plan for both cases is exactly same. They start varying only when the SELECT clause is altered. So it seems like sql server does some smart manipulations and finds out that both would produce exactly same results. This is also evident from the result of experiment. They match neck to neck.
-
Dec 29th, 2009, 11:25 AM
#10
Re: LEFT JOIN versus INNER JOIN
well, if the inner join eliminates data that you require, then the inner join isn't an option. If that's the case, then you need the outer join. Which is what I was trying to point out (and I'll admit that I'm not entirely sure that's how it came out.)
-tg
-
Jan 3rd, 2010, 07:16 PM
#11
Re: LEFT JOIN versus INNER JOIN
 Originally Posted by Pradeep1210
@tg, leinad
Yes, inner joins are definitely faster. But there comes many situations in our programs where the backend database supports null fields but is not allowed from the front end application. So in such cases, a minor bug may also cause a null value to be inserted there causing inner join to eleminate the rows in the output result where as the outer join seems like a safe bet. But I'm looking at performance implications in such cases.
@abhijit
I had compared the execution plans but forgot to mention there. The execution plan for both cases is exactly same. They start varying only when the SELECT clause is altered. So it seems like sql server does some smart manipulations and finds out that both would produce exactly same results. This is also evident from the result of experiment. They match neck to neck.
Yes, that is often the case/problem with legacy databases... ideal solution would be to correct the data architecture but no one is willing to risk their career and touch the database due to undocumented dependencies It seems you have no choice but to address the shortcomings of the database design via SQL and accept the performance penalties whatever they may be.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|