Efficient way to JOIN 2 tables
Hi all,
This is a subjective question for MS SQL Server 2005. I have been doing some consultancy work for a big organization for past few months. Working mainly on SSRS, SSAS and writing ad-hoc queries for them dolts.
I was supposed to get data (most of the time from these 2 tables, 1 is Sales / Transactions and other is Inventory). Sales has almost 10 crores (100 millions, if I am not wrong because 1 crore = 100000000) records while Inventory has 1 crore (10 millions) records.
I am supposed to get data from them between specified dates. Say I am pulling data from Sales from 1 Jan 2009 to 31 Jan 2009. Then I need to join this sub selected data to Inventory data and get final desired output.
All is fine.
But I noticed something. We used to first sub-select relevant data from Sales (selecting data between 2 dates) and then join to Inventory. Last time, I joined the tables first (100 millions into 10 millions) and then applying dates filter. And this seemed to run faster than our first approach.
Why and how is this happening? I believed that joining and sub selecting will create temp tables in master DB. Am I right in saying this? We are either LEFT JOINING or INNER JOINING the tables.
Re: Efficient way to JOIN 2 tables
I'm not sure I'm understanding you. Do you mean the difference between this:-
Code:
Select *
From Table1
Inner Join Table 2
On Table1.Field1 = Table2.Field1
And Table1.DateField Between (StartDate and EndDate)
And this:-
Code:
Select *
From Table1
Inner Join Table 2
On Table1.Field1 = Table2.Field1
Where Table1.DateField Between (StartDate and EndDate)
?
If so I did some brief experimenting on it a while back (we had an office argument over it) and didn't find a significant difference in performance. I seem to remember the execution plans were identical but I'm not 100% sure on that so check for yourself.
I suspect that SQLServer's precompiler generally applies any filters, whether they're part of the where clause or the join at the first opportunity because that will be the most efficient approach. In this case that would be before performing the merge join on the datasets.
Re: Efficient way to JOIN 2 tables
I mean difference between these:
Code:
Select *
From Table1
Inner Join Table 2
On Table1.Field1 = Table2.Field1
WHERE Table1.DateField Between (StartDate and EndDate)
And:
Code:
SELECT T1.*, T2.* FROM
(
SELECT * from Table1 WHERE Table1.DateField Between (StartDate and EndDate)
)T1
INNER JOIN Table2 T2
ON T2.Field1 = T2.Field1
Re: Efficient way to JOIN 2 tables
I really don't know in that case. Do you really need to structure you're query like that, though? It's logically identical to the two layouts I piosted and alot less readable.