|
-
Nov 11th, 2009, 01:55 PM
#1
[RESOLVED] Group by aggregate
I'm not sure the title really describes my thread well but I couldn't think of a succinct way of describing my question so here's the long version:-
We have customers. Each customer can be interested in one or more products though they almost always are interested in just one. We refer to a customer with more than one interest as a multi product customer. A customers interest is stored in the customer interest table and a multi product customer is anyone who has more than one entry in this table. A Customer will always have at least one interest.
I'm writing a query for a report and it needs to group by a customer's product interests, except that multi-product will be a single grouping. i.e. We need a row for customer who are only interested in widgets, one for customers who are only interested in doobries, one for thingummies and one for customer who are interested in more than one product.
We're also grouping by several other fields but these are all single field foreign keys on the customer record so are pretty easy to handle. An example of this would be a customer's source (where we heard about them from).
In essence that means I need to do something like this:-
Code:
Select sourceID as Source,
Case when (Select Count(*)
From CustomerInterest CI
Where CI.CustomerID = C.CustomerID) > 1
then 0 --We Use 0 to indicate Multi Product
Else (Select ProductID
From customerInterest CI
Where CI.CustomerID = C.CustomerID)
End as CustomerInterest,
Count(*) as CustomerCount
From Customer C
Group By sourceID
That won't work because the CustomerID used in the case is in the Select list but not in the Group By or an aggregate. I obviously don't want to group by Customer ID because then I'll get a single record for every customer. What I really want to do is group by :-
Code:
Case when (Select Count(*)
From CustomerInterest CI
Where CI.CustomerID = C.CustomerID) > 1
then 0 --We'll Use 0 to indicate Multi Product
Else (Select ProductID
From customerInterest CI
Where CI.CustomerID = C.CustomerID)
End
But it's got a count in and you can't put an aggregate function in the group by clause.
Any suggestions on how to achieve this?
P.S. I was working on this just before I left and, now I'm home, I don't have a sql server machine in front of me so please try and look past any spelling mistakes etc I've made above. I've just typed straight into the post.
Edit> Just had a thought as I hit the send button. I can probably just group by the alias. E.g. Group by Source, CustomerInterest. I'll have to try that tomorrow to be sure.
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
-
Nov 11th, 2009, 07:13 PM
#2
Re: Group by aggregate
Count(*) is expensive, if sequential PK is available then I'd rather check via EXISTS current.pk + 1 = chk.pk to determine if its multi product (only two records read from PK to determine multi product cust rather than counting ALL records for customer). Also, the CASE is a giveaway that your biased to iterative/array logic rather than a set based problem solving POV.
Might as well post columns with data type of relevant tables, along with available indexes
Also, please clarify content of report. Only two columns? Counts the number of non multi customers per product with additional row counting multi-product cust?
-
Nov 12th, 2009, 04:28 AM
#3
Re: Group by aggregate
I deliberately didn't give the full detail because it's big and complex and I'm happy with everrything apart from the specific aspect I asked about. I didn't want all the extra detail to get in the way but here goes nothing. I'm actually creating a table that will hold the performance data of each of our telesellers by source type, market (commercial or domestic), nation (where the customers live), and customer product interest. This table is then going to be used for further analysis. Here's the query:-
Code:
SELECT CH.UserID,
SA.SourceTypeID,
P.MarketID,
P.CountryID,
CASE WHEN (SELECT COUNT(*) FROM dbo.ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) > 1 THEN 0 ELSE (SELECT PI.ProductGroupID FROM ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) END AS ProductInterest, --Product Group or Multi Product
COUNT(*) AS Contacts, --Count of Contacts
ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) AS NetAppts, --Sales Appts (handled by the fact that we have a left join so the appt will be null unless there is an appt)
CASE WHEN COUNT(*) > 20 THEN ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) * 100/COUNT(*) ELSE NULL END AS Conversion --Appts/Contacts as a percentage gives us the conversion rate but only if they've got at least 20 contacts or it won't be meaningful
INTO #BaseUserData
FROM dbo.CallHistory CH
INNER JOIN dbo.CustomerContact CC
ON CH.CustomerContactID = CC.CustomerContactID
INNER JOIN dbo.StrategyCustomer SC
ON CC.CustomerID = SC.CustomerID
INNER JOIN dbo.Prospect P
ON SC.InitiatingProspectID = P.ProspectID
INNER JOIN dbo.Sources S
ON P.SourceID = S.SourceID
INNER JOIN dbo.Source_Accounts SA
ON S.SourceAccountID = SA.SourceAccountID
LEFT JOIN Leads L --Left joing out to leads and appt so we can see which contacts led to appts
ON P.ProspectID = L.ProspectID
AND L.DateCreated >= @FromDate
AND L.CreatedTypeID = 'T'
LEFT JOIN dbo.SalesAppointments Appt
ON L.LeadID = Appt.LeadID
WHERE CH.CallResultID IN (4, 5, 6, 7, 8, 9, 10) --Only bother counting calls where we spoke to the customer
GROUP BY CH.UserID, SA.SourceTypeID, P.MarketID, P.CountryID, P.ProspectID
HAVING COUNT(*) >= 100 --Only bother creating records where there's at least 100 contacts and 2 appts or it won't be meaningful
AND ISNULL(SUM(CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END), 0) >= 2
A brief description of the relevant tables:-
Call History: records the result (eg no reply, engaged, interested not interested etc) every time we've attempted to call a customer.
CustomerContact: records all the contacts we've made (including letters etc) with a customer. A telephone call type contact cal have several call history records because we don't count a TC as fulfilled until we get an interested or not interested result
Strategy Customer: Records a customer being in a marketting strategy and we use it to generate apropriate contacts for that customer. In the context of this query, though, it really just acts as a link to allow us to work out which of the customers prospects resulted in the telephone call contact being generated
Prospect: an instance of us hearing about a customer from a source (a customer might apprach us at an exhibition, go home and ask for a brochure on the internet and also have their details sent to us via a business partner - each of these would be a prospect for that customer)
Source: Somewhere we hear about customers from
Source Account: A higher level grouping of source. E.G. Ideal Home Exhibition would be a Source Account, the 2009 Ideal Home Exhibition would be a source.
Source Type: A higher level grouping of source accounts, eg Internet, exhibitions, partners etc
Leads: a customer who is interested in an appointment - generated from a customer record by telesales.
Sales Apponitment: A sales appointment - generated from a lead by central appointing.
By the way, I've left it grouping by prospect at the moment just so the query compiles but that's not correct. I want to group by the individual product interest where there's only one or 'mjulti product' where there's more than one.
NB A customer isn't someone who's actually bought from us but rather anyone we're marketting to (or have marketted to in the past).
edit> EXISTS current.pk + 1 = chk.pk - Not sure I understand what you're getting at here but this will be running in an overnight job and doesn't need to be performant
Re the CASE I'm quite happy with set based logic but I think the case is wholly neccessary in this case. I'm open to being corrected on that though.
edit 2> Turns out you can't group vby the alias as I'd hoped but I have just had another thought (that's two in as many days! I'm on fire baby, baby!) I could probably structure a query that would return the ungrouped data then use it as a sub query in a second query that did the grouping. I'll try that.
Last edited by FunkyDexter; Nov 12th, 2009 at 04:40 AM.
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
-
Nov 12th, 2009, 05:55 AM
#4
Re: Group by aggregate
Huzzah! I'm there. In case anyone else has a similar need here is the whole query, warts and all, that I've ended up with. I hope it'll help:-
Code:
SELECT UserID,
SourceTypeID,
MarketID,
CountryID,
ProductInterest,
COUNT(*) AS Contacts,
ISNULL(SUM(Appt), 0) AS NetAppts,
CASE WHEN COUNT(*) > 20 THEN ISNULL(SUM(Appt), 0) * 100/COUNT(*) ELSE NULL END AS Conversion --Appts/Contacts as a percentage gives us the conversion rate but only if they've got at least 20 contacts or it won't be meaningful
FROM
(SELECT CH.UserID,
SA.SourceTypeID,
P.MarketID,
P.CountryID,
CASE WHEN (SELECT COUNT(*) FROM dbo.ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) > 1 THEN 0 ELSE (SELECT PI.ProductGroupID FROM ProspectInterest PI WHERE PI.ProspectID = P.ProspectID) END AS ProductInterest, --Product Group or Multi Product
CASE WHEN Appt.LeadID IS NULL THEN 0 ELSE 1 END AS Appt
FROM dbo.CallHistory CH
INNER JOIN dbo.CustomerContact CC
ON CH.CustomerContactID = CC.CustomerContactID
INNER JOIN dbo.StrategyCustomer SC
ON CC.CustomerID = SC.CustomerID
INNER JOIN dbo.Prospect P
ON SC.InitiatingProspectID = P.ProspectID
INNER JOIN dbo.Sources S
ON P.SourceID = S.SourceID
INNER JOIN dbo.Source_Accounts SA
ON S.SourceAccountID = SA.SourceAccountID
LEFT JOIN Leads L --Left joing out to leads and appt so we can see which contacts led to appts
ON P.ProspectID = L.ProspectID
AND L.DateCreated >= @FromDate
AND L.CreatedTypeID = 'T'
LEFT JOIN dbo.SalesAppointments Appt
ON L.LeadID = Appt.LeadID
WHERE CH.CallResultID IN (4, 5, 6, 7, 8, 9, 10) --Only bother counting calls where we spoke to the customer
) RawData
GROUP BY UserID, SourceTypeID, MarketID, CountryID, ProductInterest
HAVING COUNT(*) >= 100 --Only bother creating records where there's at least 100 contacts and 2 appts or it won't be meaningful
AND ISNULL(SUM(Appt), 0) >= 2
Now all I've got to do is work out the same stats for the whole call centre so that I can express the user's performance as a function of the overall average. At this point my brain may be turning into cream cheese...
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
-
Nov 12th, 2009, 10:21 PM
#5
Re: [RESOLVED] Group by aggregate
Even if its an overnight job sooner or later the performance overhead will haunt you; it will compete for resources with other batch jobs, and data in source tables are increasing on a daily basis.
Rule of thumb with regards to INNER JOINs; eliminate as many records as possible before performing the join. I'm surprised there's no date condition for CH, so you end up checking all calls even from years back.
-
Nov 13th, 2009, 04:01 AM
#6
Re: [RESOLVED] Group by aggregate
I'm surprised there's no date condition for CH
OOPS! Good Catch. I removed the condition because my dev DB is a bit out of date so it was a cheap and cheerful way of getting data back during development. I then completely forgot to put it back in
The best argument against democracy is a five minute conversation with the average voter - Winston Churchill
Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|