Results 1 to 6 of 6

Thread: [RESOLVED] Group by aggregate

  1. #1

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Resolved [RESOLVED] Group by aggregate

    I'm not sure the title really describes my thread well but I couldn't think of a succinct way of describing my question so here's the long version:-

    We have customers. Each customer can be interested in one or more products though they almost always are interested in just one. We refer to a customer with more than one interest as a multi product customer. A customers interest is stored in the customer interest table and a multi product customer is anyone who has more than one entry in this table. A Customer will always have at least one interest.

    I'm writing a query for a report and it needs to group by a customer's product interests, except that multi-product will be a single grouping. i.e. We need a row for customer who are only interested in widgets, one for customers who are only interested in doobries, one for thingummies and one for customer who are interested in more than one product.

    We're also grouping by several other fields but these are all single field foreign keys on the customer record so are pretty easy to handle. An example of this would be a customer's source (where we heard about them from).

    In essence that means I need to do something like this:-
    Code:
    Select sourceID as Source,
             Case when (Select Count(*) 
                            From CustomerInterest CI 
                            Where CI.CustomerID = C.CustomerID) > 1
                            then 0 --We Use 0 to indicate Multi Product
                    Else (Select ProductID 
                           From customerInterest CI
                           Where CI.CustomerID = C.CustomerID)
             End as CustomerInterest,
             Count(*) as CustomerCount
    From Customer C
    Group By sourceID
    That won't work because the CustomerID used in the case is in the Select list but not in the Group By or an aggregate. I obviously don't want to group by Customer ID because then I'll get a single record for every customer. What I really want to do is group by :-
    Code:
    Case when (Select Count(*) 
                            From CustomerInterest CI 
                            Where CI.CustomerID = C.CustomerID) > 1
                            then 0 --We'll Use 0 to indicate Multi Product
                    Else (Select ProductID 
                           From customerInterest CI
                           Where CI.CustomerID = C.CustomerID)
             End
    But it's got a count in and you can't put an aggregate function in the group by clause.

    Any suggestions on how to achieve this?

    P.S. I was working on this just before I left and, now I'm home, I don't have a sql server machine in front of me so please try and look past any spelling mistakes etc I've made above. I've just typed straight into the post.



    Edit> Just had a thought as I hit the send button. I can probably just group by the alias. E.g. Group by Source, CustomerInterest. I'll have to try that tomorrow to be sure.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  2. #2
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629

    Re: Group by aggregate

    Count(*) is expensive, if sequential PK is available then I'd rather check via EXISTS current.pk + 1 = chk.pk to determine if its multi product (only two records read from PK to determine multi product cust rather than counting ALL records for customer). Also, the CASE is a giveaway that your biased to iterative/array logic rather than a set based problem solving POV.

    Might as well post columns with data type of relevant tables, along with available indexes

    Also, please clarify content of report. Only two columns? Counts the number of non multi customers per product with additional row counting multi-product cust?

  3. #3

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: Group by aggregate

    I deliberately didn't give the full detail because it's big and complex and I'm happy with everrything apart from the specific aspect I asked about. I didn't want all the extra detail to get in the way but here goes nothing. I'm actually creating a table that will hold the performance data of each of our telesellers by source type, market (commercial or domestic), nation (where the customers live), and customer product interest. This table is then going to be used for further analysis. Here's the query:-

    Code:
    SELECT	CH.UserID, 
    			SA.SourceTypeID, 
    			P.MarketID, 
    			P.CountryID, 
    			CASE WHEN (SELECT COUNT(*) FROM dbo.ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) > 1 THEN 0 ELSE (SELECT PI.ProductGroupID FROM ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) END AS ProductInterest, --Product Group or Multi Product
    			COUNT(*) AS Contacts, --Count of Contacts
    			ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) AS NetAppts, --Sales Appts (handled by the fact that we have a left join so the appt will be null unless there is an appt)
    			CASE WHEN COUNT(*) > 20 THEN ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) * 100/COUNT(*) ELSE NULL END AS Conversion  --Appts/Contacts as a percentage gives us the conversion rate but only if they've got at least 20 contacts or it won't be meaningful
    	INTO #BaseUserData		
    	FROM dbo.CallHistory CH
    	INNER JOIN dbo.CustomerContact CC
    		ON CH.CustomerContactID = CC.CustomerContactID
    	INNER JOIN dbo.StrategyCustomer SC
    		ON CC.CustomerID = SC.CustomerID
    	INNER JOIN dbo.Prospect P
    		ON SC.InitiatingProspectID = P.ProspectID
    	INNER JOIN dbo.Sources S
    		ON P.SourceID = S.SourceID
    	INNER JOIN dbo.Source_Accounts SA
    		ON S.SourceAccountID = SA.SourceAccountID
    	LEFT JOIN Leads L  --Left joing out to leads and appt so we can see which contacts led to appts
    		ON P.ProspectID = L.ProspectID
    		AND L.DateCreated >= @FromDate 
    		AND L.CreatedTypeID = 'T'
    	LEFT JOIN dbo.SalesAppointments Appt
    		ON L.LeadID = Appt.LeadID	
    	WHERE CH.CallResultID IN (4, 5, 6, 7, 8, 9, 10) --Only bother counting calls where we spoke to the customer
    	GROUP BY CH.UserID, SA.SourceTypeID, P.MarketID, P.CountryID, P.ProspectID
    	HAVING COUNT(*) >= 100  --Only bother creating records where there's at least 100 contacts and 2 appts or it won't be meaningful
    	AND ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) >= 2
    A brief description of the relevant tables:-
    Call History: records the result (eg no reply, engaged, interested not interested etc) every time we've attempted to call a customer.
    CustomerContact: records all the contacts we've made (including letters etc) with a customer. A telephone call type contact cal have several call history records because we don't count a TC as fulfilled until we get an interested or not interested result
    Strategy Customer: Records a customer being in a marketting strategy and we use it to generate apropriate contacts for that customer. In the context of this query, though, it really just acts as a link to allow us to work out which of the customers prospects resulted in the telephone call contact being generated
    Prospect: an instance of us hearing about a customer from a source (a customer might apprach us at an exhibition, go home and ask for a brochure on the internet and also have their details sent to us via a business partner - each of these would be a prospect for that customer)
    Source: Somewhere we hear about customers from
    Source Account: A higher level grouping of source. E.G. Ideal Home Exhibition would be a Source Account, the 2009 Ideal Home Exhibition would be a source.
    Source Type: A higher level grouping of source accounts, eg Internet, exhibitions, partners etc
    Leads: a customer who is interested in an appointment - generated from a customer record by telesales.
    Sales Apponitment: A sales appointment - generated from a lead by central appointing.

    By the way, I've left it grouping by prospect at the moment just so the query compiles but that's not correct. I want to group by the individual product interest where there's only one or 'mjulti product' where there's more than one.

    NB A customer isn't someone who's actually bought from us but rather anyone we're marketting to (or have marketted to in the past).


    edit> EXISTS current.pk + 1 = chk.pk - Not sure I understand what you're getting at here but this will be running in an overnight job and doesn't need to be performant

    Re the CASE I'm quite happy with set based logic but I think the case is wholly neccessary in this case. I'm open to being corrected on that though.



    edit 2> Turns out you can't group vby the alias as I'd hoped but I have just had another thought (that's two in as many days! I'm on fire baby, baby!) I could probably structure a query that would return the ungrouped data then use it as a sub query in a second query that did the grouping. I'll try that.
    Last edited by FunkyDexter; Nov 12th, 2009 at 04:40 AM.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  4. #4

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: Group by aggregate

    Huzzah! I'm there. In case anyone else has a similar need here is the whole query, warts and all, that I've ended up with. I hope it'll help:-
    Code:
    SELECT	UserID,
    			SourceTypeID,
    			MarketID,
    			CountryID,
    			ProductInterest,
    			COUNT(*) AS Contacts,
    			ISNULL(SUM(Appt), 0) AS NetAppts,
    			CASE WHEN COUNT(*) > 20 THEN ISNULL(SUM(Appt), 0) * 100/COUNT(*) ELSE NULL END AS Conversion  --Appts/Contacts as a percentage gives us the conversion rate but only if they've got at least 20 contacts or it won't be meaningful
    	FROM 	
    		(SELECT	CH.UserID, 
    				SA.SourceTypeID, 
    				P.MarketID, 
    				P.CountryID, 
    				CASE WHEN (SELECT COUNT(*) FROM dbo.ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) > 1 THEN 0 ELSE (SELECT PI.ProductGroupID FROM ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) END AS ProductInterest, --Product Group or Multi Product
    				CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END AS Appt
    		FROM dbo.CallHistory CH
    		INNER JOIN dbo.CustomerContact CC
    			ON CH.CustomerContactID = CC.CustomerContactID
    		INNER JOIN dbo.StrategyCustomer SC
    			ON CC.CustomerID = SC.CustomerID
    		INNER JOIN dbo.Prospect P
    			ON SC.InitiatingProspectID = P.ProspectID
    		INNER JOIN dbo.Sources S
    			ON P.SourceID = S.SourceID
    		INNER JOIN dbo.Source_Accounts SA
    			ON S.SourceAccountID = SA.SourceAccountID
    		LEFT JOIN Leads L  --Left joing out to leads and appt so we can see which contacts led to appts
    			ON P.ProspectID = L.ProspectID
    			AND L.DateCreated >= @FromDate 
    			AND L.CreatedTypeID = 'T'
    		LEFT JOIN dbo.SalesAppointments Appt
    			ON L.LeadID = Appt.LeadID	
    		WHERE CH.CallResultID IN (4, 5, 6, 7, 8, 9, 10) --Only bother counting calls where we spoke to the customer
    		) RawData	
    	GROUP BY UserID, SourceTypeID, MarketID, CountryID, ProductInterest
    	HAVING COUNT(*) >= 100  --Only bother creating records where there's at least 100 contacts and 2 appts or it won't be meaningful
    	AND ISNULL(SUM(Appt), 0) >= 2
    Now all I've got to do is work out the same stats for the whole call centre so that I can express the user's performance as a function of the overall average. At this point my brain may be turning into cream cheese...
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  5. #5
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629

    Re: [RESOLVED] Group by aggregate

    Even if its an overnight job sooner or later the performance overhead will haunt you; it will compete for resources with other batch jobs, and data in source tables are increasing on a daily basis.

    Rule of thumb with regards to INNER JOINs; eliminate as many records as possible before performing the join. I'm surprised there's no date condition for CH, so you end up checking all calls even from years back.

  6. #6

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: [RESOLVED] Group by aggregate

    I'm surprised there's no date condition for CH
    OOPS! Good Catch. I removed the condition because my dev DB is a bit out of date so it was a cheap and cheerful way of getting data back during development. I then completely forgot to put it back in
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width