[RESOLVED] Group by aggregate

**FunkyDexter** · Nov 11th, 2009, 01:55 PM

I'm not sure the title really describes my thread well but I couldn't think of a succinct way of describing my question so here's the long version:-

We have customers. Each customer can be interested in one or more products though they almost always are interested in just one. We refer to a customer with more than one interest as a multi product customer. A customers interest is stored in the customer interest table and a multi product customer is anyone who has more than one entry in this table. A Customer will always have at least one interest.

I'm writing a query for a report and it needs to group by a customer's product interests, except that multi-product will be a single grouping. i.e. We need a row for customer who are only interested in widgets, one for customers who are only interested in doobries, one for thingummies and one for customer who are interested in more than one product.

We're also grouping by several other fields but these are all single field foreign keys on the customer record so are pretty easy to handle. An example of this would be a customer's source (where we heard about them from).

In essence that means I need to do something like this:-

Code:

Select sourceID as Source,
         Case when (Select Count(*) 
                        From CustomerInterest CI 
                        Where CI.CustomerID = C.CustomerID) > 1
                        then 0 --We Use 0 to indicate Multi Product
                Else (Select ProductID 
                       From customerInterest CI
                       Where CI.CustomerID = C.CustomerID)
         End as CustomerInterest,
         Count(*) as CustomerCount
From Customer C
Group By sourceID

That won't work because the CustomerID used in the case is in the Select list but not in the Group By or an aggregate. I obviously don't want to group by Customer ID because then I'll get a single record for every customer. What I really want to do is group by :-

Code:

Case when (Select Count(*) 
                        From CustomerInterest CI 
                        Where CI.CustomerID = C.CustomerID) > 1
                        then 0 --We'll Use 0 to indicate Multi Product
                Else (Select ProductID 
                       From customerInterest CI
                       Where CI.CustomerID = C.CustomerID)
         End

But it's got a count in and you can't put an aggregate function in the group by clause.

Any suggestions on how to achieve this?

P.S. I was working on this just before I left and, now I'm home, I don't have a sql server machine in front of me so please try and look past any spelling mistakes etc I've made above. I've just typed straight into the post.

Edit> Just had a thought as I hit the send button. I can probably just group by the alias. E.g. Group by Source, CustomerInterest. I'll have to try that tomorrow to be sure.

**leinad31** · Nov 11th, 2009, 07:13 PM

Count(*) is expensive, if sequential PK is available then I'd rather check via EXISTS current.pk + 1 = chk.pk to determine if its multi product (only two records read from PK to determine multi product cust rather than counting ALL records for customer). Also, the CASE is a giveaway that your biased to iterative/array logic rather than a set based problem solving POV.

Might as well post columns with data type of relevant tables, along with available indexes

Also, please clarify content of report. Only two columns? Counts the number of non multi customers per product with additional row counting multi-product cust?

**FunkyDexter** · Nov 12th, 2009, 04:28 AM

I deliberately didn't give the full detail because it's big and complex and I'm happy with everrything apart from the specific aspect I asked about. I didn't want all the extra detail to get in the way but here goes nothing. I'm actually creating a table that will hold the performance data of each of our telesellers by source type, market (commercial or domestic), nation (where the customers live), and customer product interest. This table is then going to be used for further analysis. Here's the query:-

Code:

SELECT	CH.UserID, 
			SA.SourceTypeID, 
			P.MarketID, 
			P.CountryID, 
			CASE WHEN (SELECT COUNT(*) FROM dbo.ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) > 1 THEN 0 ELSE (SELECT PI.ProductGroupID FROM ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) END AS ProductInterest, --Product Group or Multi Product
			COUNT(*) AS Contacts, --Count of Contacts
			ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) AS NetAppts, --Sales Appts (handled by the fact that we have a left join so the appt will be null unless there is an appt)
			CASE WHEN COUNT(*) > 20 THEN ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) * 100/COUNT(*) ELSE NULL END AS Conversion  --Appts/Contacts as a percentage gives us the conversion rate but only if they've got at least 20 contacts or it won't be meaningful
	INTO #BaseUserData		
	FROM dbo.CallHistory CH
	INNER JOIN dbo.CustomerContact CC
		ON CH.CustomerContactID = CC.CustomerContactID
	INNER JOIN dbo.StrategyCustomer SC
		ON CC.CustomerID = SC.CustomerID
	INNER JOIN dbo.Prospect P
		ON SC.InitiatingProspectID = P.ProspectID
	INNER JOIN dbo.Sources S
		ON P.SourceID = S.SourceID
	INNER JOIN dbo.Source_Accounts SA
		ON S.SourceAccountID = SA.SourceAccountID
	LEFT JOIN Leads L  --Left joing out to leads and appt so we can see which contacts led to appts
		ON P.ProspectID = L.ProspectID
		AND L.DateCreated >= @FromDate 
		AND L.CreatedTypeID = 'T'
	LEFT JOIN dbo.SalesAppointments Appt
		ON L.LeadID = Appt.LeadID	
	WHERE CH.CallResultID IN (4, 5, 6, 7, 8, 9, 10) --Only bother counting calls where we spoke to the customer
	GROUP BY CH.UserID, SA.SourceTypeID, P.MarketID, P.CountryID, P.ProspectID
	HAVING COUNT(*) >= 100  --Only bother creating records where there's at least 100 contacts and 2 appts or it won't be meaningful
	AND ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) >= 2

A brief description of the relevant tables:-
Call History: records the result (eg no reply, engaged, interested not interested etc) every time we've attempted to call a customer.
CustomerContact: records all the contacts we've made (including letters etc) with a customer. A telephone call type contact cal have several call history records because we don't count a TC as fulfilled until we get an interested or not interested result
Strategy Customer: Records a customer being in a marketting strategy and we use it to generate apropriate contacts for that customer. In the context of this query, though, it really just acts as a link to allow us to work out which of the customers prospects resulted in the telephone call contact being generated
Prospect: an instance of us hearing about a customer from a source (a customer might apprach us at an exhibition, go home and ask for a brochure on the internet and also have their details sent to us via a business partner - each of these would be a prospect for that customer)
Source: Somewhere we hear about customers from
Source Account: A higher level grouping of source. E.G. Ideal Home Exhibition would be a Source Account, the 2009 Ideal Home Exhibition would be a source.
Source Type: A higher level grouping of source accounts, eg Internet, exhibitions, partners etc
Leads: a customer who is interested in an appointment - generated from a customer record by telesales.
Sales Apponitment: A sales appointment - generated from a lead by central appointing.

By the way, I've left it grouping by prospect at the moment just so the query compiles but that's not correct. I want to group by the individual product interest where there's only one or 'mjulti product' where there's more than one.

NB A customer isn't someone who's actually bought from us but rather anyone we're marketting to (or have marketted to in the past).

edit> EXISTS current.pk + 1 = chk.pk - Not sure I understand what you're getting at here but this will be running in an overnight job and doesn't need to be performant

Re the CASE I'm quite happy with set based logic but I think the case is wholly neccessary in this case. I'm open to being corrected on that though.

edit 2> Turns out you can't group vby the alias as I'd hoped but I have just had another thought (that's two in as many days! I'm on fire baby, baby!) I could probably structure a query that would return the ungrouped data then use it as a sub query in a second query that did the grouping. I'll try that.

**FunkyDexter** · Nov 12th, 2009, 05:55 AM

Huzzah! I'm there. In case anyone else has a similar need here is the whole query, warts and all, that I've ended up with. I hope it'll help:-

Code:

SELECT	UserID,
			SourceTypeID,
			MarketID,
			CountryID,
			ProductInterest,
			COUNT(*) AS Contacts,
			ISNULL(SUM(Appt), 0) AS NetAppts,
			CASE WHEN COUNT(*) > 20 THEN ISNULL(SUM(Appt), 0) * 100/COUNT(*) ELSE NULL END AS Conversion  --Appts/Contacts as a percentage gives us the conversion rate but only if they've got at least 20 contacts or it won't be meaningful
	FROM 	
		(SELECT	CH.UserID, 
				SA.SourceTypeID, 
				P.MarketID, 
				P.CountryID, 
				CASE WHEN (SELECT COUNT(*) FROM dbo.ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) > 1 THEN 0 ELSE (SELECT PI.ProductGroupID FROM ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) END AS ProductInterest, --Product Group or Multi Product
				CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END AS Appt
		FROM dbo.CallHistory CH
		INNER JOIN dbo.CustomerContact CC
			ON CH.CustomerContactID = CC.CustomerContactID
		INNER JOIN dbo.StrategyCustomer SC
			ON CC.CustomerID = SC.CustomerID
		INNER JOIN dbo.Prospect P
			ON SC.InitiatingProspectID = P.ProspectID
		INNER JOIN dbo.Sources S
			ON P.SourceID = S.SourceID
		INNER JOIN dbo.Source_Accounts SA
			ON S.SourceAccountID = SA.SourceAccountID
		LEFT JOIN Leads L  --Left joing out to leads and appt so we can see which contacts led to appts
			ON P.ProspectID = L.ProspectID
			AND L.DateCreated >= @FromDate 
			AND L.CreatedTypeID = 'T'
		LEFT JOIN dbo.SalesAppointments Appt
			ON L.LeadID = Appt.LeadID	
		WHERE CH.CallResultID IN (4, 5, 6, 7, 8, 9, 10) --Only bother counting calls where we spoke to the customer
		) RawData	
	GROUP BY UserID, SourceTypeID, MarketID, CountryID, ProductInterest
	HAVING COUNT(*) >= 100  --Only bother creating records where there's at least 100 contacts and 2 appts or it won't be meaningful
	AND ISNULL(SUM(Appt), 0) >= 2

Now all I've got to do is work out the same stats for the whole call centre so that I can express the user's performance as a function of the overall average. At this point my brain may be turning into cream cheese...

**leinad31** · Nov 12th, 2009, 10:21 PM

Even if its an overnight job sooner or later the performance overhead will haunt you; it will compete for resources with other batch jobs, and data in source tables are increasing on a daily basis.

Rule of thumb with regards to INNER JOINs; eliminate as many records as possible before performing the join. I'm surprised there's no date condition for CH, so you end up checking all calls even from years back.

**FunkyDexter** · Nov 13th, 2009, 04:01 AM

I'm surprised there's no date condition for CH

OOPS! Good Catch. I removed the condition because my dev DB is a bit out of date so it was a cheap and cheerful way of getting data back during development. I then completely forgot to put it back in

Thread: [RESOLVED] Group by aggregate

Thread Tools

Display

[RESOLVED] Group by aggregate

Re: Group by aggregate

Re: Group by aggregate

Re: Group by aggregate

Re: [RESOLVED] Group by aggregate

Re: [RESOLVED] Group by aggregate

Posting Permissions