Results 1 to 18 of 18

Thread: [RESOLVED] Why is "not in" slower than "join"

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Resolved [RESOLVED] Why is "not in" slower than "join"

    I have this query to retrieve rows from table r which do not have a corresponding entry in the table ra.
    Code:
    SELECT name, r_id FROM r WHERE r_id NOT IN (SELECT r_id FROM ra GROUP BY r_id);
    This one is taking about 8 seconds to execute.
    On the other hand, this query
    Code:
    SELECT r.name, r.r_id FROM r 
    LEFT JOIN ra on r.r_id = ra.r_id
    WHERE ra.r_id is NULL;
    is taking less than a second.
    Can you explain the reason for the difference in time taken?
    Thank you.

  2. #2
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Why is "not in" slower than "join"

    If you are using MS SQL then you can review the execution plan and see exactly why it's different.

    From a 30,000 ft view...

    I have always felt that JOIN is the natural SQL engine experience. The optimizer might actually be smart enough to go to the table ra first in that query (you would have to view the execution plan to really know that).

    NOT IN is a tough construct to understand. If you imagine for a second it might actually build a "list" and then hit each r_id against that list. Building that can't be fast...

    btw - you can use JOIN instead of LEFT JOIN and actually get rid of your WHERE statement. JOIN will only return a row from the table on the "left" if the sister row exists in the table on the "right" - which is what your WHERE clause is testing.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  3. #3
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: Why is "not in" slower than "join"

    btw - you can use JOIN instead of LEFT JOIN and actually get rid of your WHERE statement. JOIN will only return a row from the table on the "left" if the sister row exists in the table on the "right" - which is what your WHERE clause is testing.
    That won't work because he's actually test that there's not a row in the sister table.

    I must admit I tend to use a not exists clause for that myself but I've never performance tested it.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  4. #4

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: Why is "not in" slower than "join"

    Quote Originally Posted by szlamany
    If you are using MS SQL then you can review the execution plan and see exactly why it's different.

    btw - you can use JOIN instead of LEFT JOIN and actually get rid of your WHERE statement. JOIN will only return a row from the table on the "left" if the sister row exists in the table on the "right" - which is what your WHERE clause is testing.
    I am using MySQL. Will 'EXPLAIN' review the execution plan.
    I am not sure if JOIN will do what I want. I want to know r_id's in table r which are NOT in table r_id. I can't claim to be knowledgeable than you but JOIN will give me rows which have entry in both tables.

  5. #5
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Why is "not in" slower than "join"

    Quote Originally Posted by FunkyDexter
    That won't work because he's actually test that there's not a row in the sister table.
    oops - I didn't catch that the WHERE clause was looking for that condition!

    JOIN will not work in this situation...

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  6. #6
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Why is "not in" slower than "join"

    In MS SQL you can see an execution plan.

    I did this set of queries

    Code:
    Create Table #Test1 (ID1 int)
    Create Table #Test2 (ID2 int)
    
    Insert into #Test1 Select MasId From Retfiles.dbo.Person_T
    Insert into #Test2 Select MasId From Retfiles.dbo.Person_T Where MasId<15000
    
    Select * From #Test1 Left Join #Test2 on #Test2.Id2=#Test1.Id1 Where #Test2.Id2 is null
    
    Select * From #Test1 Where Id1 not in (Select Id2 From #Test2)
    
    Drop Table #Test1 Drop Table #Test2
    And this is the execution plan - note the additional steps taken in the second query.

    The plan only represents the two SELECT queries - you can see that the first one (JOIN) is 13% of the total batch time - the NOT IN was 87% of the total batch time. That's about 8 times slower.
    Attached Images Attached Images  
    Last edited by szlamany; Feb 13th, 2008 at 10:19 AM.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  7. #7

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: Why is "not in" slower than "join"

    There is a statement called 'EXPLAIN' which gives out info about the performance of a query in MySQL. And here is the link which gives detailed info about the 'EXPLAIN'.explain

  8. #8
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Why is "not in" slower than "join"

    Have you used EXPLAIN on your queries to see if the info given is meaningful? The example on the link you gave is rather simplistic.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  9. #9

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: Why is "not in" slower than "join"

    I have attached screen shot of the result as given by phpmyadmin.
    First result is for the join, second one is for the not in.
    Attached Images Attached Images  

  10. #10
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: Why is "not in" slower than "join"

    I've never examined the results of explain before but looking at the results you got and checking against the info on your link, two things imediately stand out:-
    1. The link says that Type is the all important column and gives a list of which types are prefereble. In your results the join has an all and a ref, the 'in' has an all and an index. a ref is prefereable to an index so that's one reason it's faster.
    2. Look at the row count being processed. The join processes 1 and 5545, the 'in' processes 5530 and 5565. As it's a subquery I suspect you should actually multiply those two numbers to get a meaningful indication of performance, which means the 'in' would be something like 5000 time less efficient! I suspect that lots of other factors come into play meaning you won't see anything like that much difference but it's not a good starting point.



    That's about 8 times slower
    I'm surprised it makes that much difference . I don't suppose your in a position to test out what a 'not exists' would do are you?
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  11. #11

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: Why is "not in" slower than "join"

    With the "not in" we assume that the optimizer will retrieve all the r_id's from ra table and do something like
    Code:
    select name, r_id from r where r_id not in (1,2,3,4);
    But that doesn't seem to be the case. It appears that a full table scan of ra table is being performed for every row of table r. Is that how the optimizer works? Can we somehow force it to perform the subquery just once?

  12. #12
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Why is "not in" slower than "join"

    Quote Originally Posted by srisa
    Can we somehow force it to perform the subquery just once?
    Yes - you can.

    That's what LEFT JOIN does!!!

    That's why it's faster - that's why it's more natural. It's the whole essence of the set-based query processor that SQL uses.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  13. #13

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: Why is "not in" slower than "join"

    Thanks folks. This was a good learning experience.

  14. #14
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: [RESOLVED] Why is "not in" slower than "join"

    Ok - I did these two and compared them

    Code:
    Select * From #Test1 Left Join #Test2 on #Test2.Id2=#Test1.Id1 Where #Test2.Id2 is null
    
    Select * From #Test1 Where not exists (Select Id2 From #Test2 Where #Test2.ID2=#Test1.ID1)
    And the NOT EXISTS was faster - but only by 1% over the whole run. Actually the two execution plans are pretty similar. The FILTER that the first one had is what the NOT EXISTS eliminated.

    That FILTER is the actual WHERE ID2 IS NULL construct.
    Attached Images Attached Images  

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  15. #15

    Thread Starter
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: [RESOLVED] Why is "not in" slower than "join"

    And this is what EXPLAIN says for 'not exists'.
    Attached Images Attached Images  

  16. #16
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: [RESOLVED] Why is "not in" slower than "join"

    So it looks like the Not exists would probably be fairly efficient on MySQL but rotton on SQLServer. Left join's definitely the way to go on both though. Thanks for the results, guys.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  17. #17
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: [RESOLVED] Why is "not in" slower than "join"

    Actually NOT EXISTS was 49% of the run time in that example - the LEFT JOIN was 51%. Basically the query optimizer selected nearly identical execution plans - just a different hash match at the end.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  18. #18
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: [RESOLVED] Why is "not in" slower than "join"

    Oh I see. Sorry, I missread your post. Pretty similar overall then.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width