If you are using MS SQL then you can review the execution plan and see exactly why it's different.
From a 30,000 ft view...
I have always felt that JOIN is the natural SQL engine experience. The optimizer might actually be smart enough to go to the table ra first in that query (you would have to view the execution plan to really know that).
NOT IN is a tough construct to understand. If you imagine for a second it might actually build a "list" and then hit each r_id against that list. Building that can't be fast...
btw - you can use JOIN instead of LEFT JOIN and actually get rid of your WHERE statement. JOIN will only return a row from the table on the "left" if the sister row exists in the table on the "right" - which is what your WHERE clause is testing.
*** Read the sticky in the DB forum about how to get your question answered quickly!! ***
Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".
An obscure body in the SK system. The inhabitants call it Earth
Posts
7,957
Re: Why is "not in" slower than "join"
btw - you can use JOIN instead of LEFT JOIN and actually get rid of your WHERE statement. JOIN will only return a row from the table on the "left" if the sister row exists in the table on the "right" - which is what your WHERE clause is testing.
That won't work because he's actually test that there's not a row in the sister table.
I must admit I tend to use a not exists clause for that myself but I've never performance tested it.
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
If you are using MS SQL then you can review the execution plan and see exactly why it's different.
btw - you can use JOIN instead of LEFT JOIN and actually get rid of your WHERE statement. JOIN will only return a row from the table on the "left" if the sister row exists in the table on the "right" - which is what your WHERE clause is testing.
I am using MySQL. Will 'EXPLAIN' review the execution plan.
I am not sure if JOIN will do what I want. I want to know r_id's in table r which are NOT in table r_id. I can't claim to be knowledgeable than you but JOIN will give me rows which have entry in both tables.
Create Table #Test1 (ID1 int)
Create Table #Test2 (ID2 int)
Insert into #Test1 Select MasId From Retfiles.dbo.Person_T
Insert into #Test2 Select MasId From Retfiles.dbo.Person_T Where MasId<15000
Select * From #Test1 Left Join #Test2 on #Test2.Id2=#Test1.Id1 Where #Test2.Id2 is null
Select * From #Test1 Where Id1 not in (Select Id2 From #Test2)
Drop Table #Test1 Drop Table #Test2
And this is the execution plan - note the additional steps taken in the second query.
The plan only represents the two SELECT queries - you can see that the first one (JOIN) is 13% of the total batch time - the NOT IN was 87% of the total batch time. That's about 8 times slower.
Last edited by szlamany; Feb 13th, 2008 at 10:19 AM.
*** Read the sticky in the DB forum about how to get your question answered quickly!! ***
Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".
There is a statement called 'EXPLAIN' which gives out info about the performance of a query in MySQL. And here is the link which gives detailed info about the 'EXPLAIN'.explain
An obscure body in the SK system. The inhabitants call it Earth
Posts
7,957
Re: Why is "not in" slower than "join"
I've never examined the results of explain before but looking at the results you got and checking against the info on your link, two things imediately stand out:-
1. The link says that Type is the all important column and gives a list of which types are prefereble. In your results the join has an all and a ref, the 'in' has an all and an index. a ref is prefereable to an index so that's one reason it's faster.
2. Look at the row count being processed. The join processes 1 and 5545, the 'in' processes 5530 and 5565. As it's a subquery I suspect you should actually multiply those two numbers to get a meaningful indication of performance, which means the 'in' would be something like 5000 time less efficient! I suspect that lots of other factors come into play meaning you won't see anything like that much difference but it's not a good starting point.
That's about 8 times slower
I'm surprised it makes that much difference . I don't suppose your in a position to test out what a 'not exists' would do are you?
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
With the "not in" we assume that the optimizer will retrieve all the r_id's from ra table and do something like
Code:
select name, r_id from r where r_id not in (1,2,3,4);
But that doesn't seem to be the case. It appears that a full table scan of ra table is being performed for every row of table r. Is that how the optimizer works? Can we somehow force it to perform the subquery just once?
Select * From #Test1 Left Join #Test2 on #Test2.Id2=#Test1.Id1 Where #Test2.Id2 is null
Select * From #Test1 Where not exists (Select Id2 From #Test2 Where #Test2.ID2=#Test1.ID1)
And the NOT EXISTS was faster - but only by 1% over the whole run. Actually the two execution plans are pretty similar. The FILTER that the first one had is what the NOT EXISTS eliminated.
That FILTER is the actual WHERE ID2 IS NULL construct.
*** Read the sticky in the DB forum about how to get your question answered quickly!! ***
Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".
An obscure body in the SK system. The inhabitants call it Earth
Posts
7,957
Re: [RESOLVED] Why is "not in" slower than "join"
So it looks like the Not exists would probably be fairly efficient on MySQL but rotton on SQLServer. Left join's definitely the way to go on both though. Thanks for the results, guys.
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
Actually NOT EXISTS was 49% of the run time in that example - the LEFT JOIN was 51%. Basically the query optimizer selected nearly identical execution plans - just a different hash match at the end.
*** Read the sticky in the DB forum about how to get your question answered quickly!! ***
Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".