PDA

Click to See Complete Forum and Search --> : Access SQL/Query grouping problem


JoeyCode
Jun 29th, 2000, 07:38 AM
I'm having a problem getting some Access SQL to do what I want for reporting. I have a SCANS table that captures timestamps of student scans for attendance purposes. I want to grab, for each user id that has a scan, the lowest date and the highest date (for a given date range, but I'll worry about that later). To start, I'm just trying to get the highest (max) date, but I also need all the other fields each that record, which is where I'm having the problem. First, here's some sample data:


user_id timestamp station building
1 01/01/1999 12:00:00 am 1 1
1 05/19/2000 11:56:10 pm 2 2
1 05/19/2000 11:57:47 pm 2 2
5 09/27/1999 12:00:00 am 1 1
5 05/19/2000 11:56:10 pm 2 2
5 05/19/2000 11:57:47 pm 2 2
6 05/19/2000 12:00:00 am 1 1
6 05/19/2000 11:56:10 pm 2 2
6 05/19/2000 11:57:47 pm 2 2
7 05/19/2000 11:56:10 pm 2 2
7 05/19/2000 11:57:47 pm 2 2


The user id of the user who was scanned, the timestamp of the scan, and the station and building where it happened (there's more, but I left them out here for example).

Anyway, when I do this query:

SELECT QSA.user_id, Max(QSA.timestamp) AS MaxDate
FROM Q_SCANS_ALL AS QSA
GROUP BY QSA.user_id;

I get the correct results:


user_id MaxDate
1 05/19/00 11:57:47 PM
5 05/19/00 11:57:47 PM
6 05/19/00 11:57:47 PM
7 05/19/00 11:57:47 PM


but I need all the fields. So when I do a query like this:

SELECT QSA.user_id, Max(QSA.timestamp) AS MaxDate, QSA.station, QSA.building
FROM Q_SCANS_ALL AS QSA
GROUP BY QSA.user_id, QSA.station, QSA.building;

(you're required to include all the shown fields in the GROUP BY if they're not in an aggregate function like MAX), I get this:


user_id MaxDate station building
1 01/01/99 12:00:00 AM 1 1
1 05/19/00 11:57:47 PM 2 2
5 09/27/99 12:00:00 AM 1 1
5 05/19/00 11:57:47 PM 2 2
6 05/19/00 12:00:00 AM 1 1
6 05/19/00 11:57:47 PM 2 2
7 05/19/00 11:57:47 PM 2 2


Because it considers each complete record that does not have identical info in the GROUP BY fields to be a different group, each difference in station and building creates a new group for that user and a new MAX date. What I want is to group by just the user id, but still return all the fields in the matching record that has the MAX date in it.

I've tried various things, like using HAVING or some sort of ORDER BY or a subquery, but nothing seems like it'll do what I need. Can anyone think of anything that might help...I didn't think this fairly simple thing would be such a pain. I don't want to have to do this programmatically if I can avoid it.

Thanks,
-JoeyCode

Mongo
Jun 30th, 2000, 12:26 PM
You're on the right track. To keep it simple, save your
"correct resultset" as a query ( SELECT QSA.user_id,
Max(QSA.timestamp) AS MaxDate FROM Q_SCANS_ALL AS QSA
GROUP BY QSA.user_id;).

Then build a second SELECT * query that joins Q_SCANS_ALL
with the user_id and MaxDate from your first.

JoeyCode
Jun 30th, 2000, 01:53 PM
I can't do that, Mongo. The key for that table is user_id, timestamp, station and building combined, so I need to use them all for matching. While it is highly unlikely, it is POSSIBLE that there could be two SCANS records with the same user_id and timestamp (a machine with an incorrect clock, for instance). Yes, it's nearly impossible, but I have to allow for it, so I need to retrieve the entire record (or at least the entire key).

Thanks,
-JoeyCode

JoeyCode
Jun 30th, 2000, 02:49 PM
On second thought, what difference does it make if two or more records have the same user_id/timestamp combo? I just want one of them, I guess it doesn't matter which. But, if anyone can help further with this grouping issue, I'd appreciate it. Thanks Mongo.

-JoeyCode

FrancisC
Jul 2nd, 2000, 01:38 PM
I think your solution is in a subquery. I don't use Access databases but here is an example of a query I have done in Oracle....

The table I am looking at contains records that I need to retrieve by barcode, but one barcode can have several groups or records, identified by an id. I want to retrieve the most recent data.


sSQL = "select dcd.pos_begin, dcd.pos_end, dcd.good from " _
& "draw_cutplans dc, draw_cutplan_details dcd " _
& "where dc.id = dcd.id and dc.id in " _
& "(select max(id) from draw_cutplans " _
& "where bc = '" & frmMain.txtDrawBC.Text & "' group by bc) " _
& "order by dcd.pos_begin desc"


I think you should be able to use the same kind of technique. You would just need a group by user_id clause in your subquery.

Does that help ?

JoeyCode
Jul 5th, 2000, 05:45 PM
Francis, I'm working on something like that...I made this query:


SELECT QSA.user_id, QSA.timestamp AS MaxDate, QSA.station, QSA.building
FROM Q_SCANS_ALL AS QSA
WHERE QSA.timestamp = (SELECT max(QSA2.timestamp)
FROM Q_SCANS_ALL QSA2
WHERE QSA2.user_id = QSA.user_id);


And it works, EXCEPT if there are duplicate timestamps for the same user. Again, that is very unlikely to happen, but it could. So, if I've got some records like this:


user_id timestamp station building
6 05/19/2000 02:50:44 am 1 4
6 05/19/2000 11:57:47 pm 2 3
6 05/19/2000 11:57:47 pm 1 4


I'll end up with two records for user_id 6 instead of just one. I don't care which one, but I just want one. That's where I'm stuck right now.

I can't do a max(station) and max(building), because then I'd end up with:

user_id timestamp station building
6 05/19/2000 11:57:47 pm 2 4

Which is not accurate, station 2 is not in building 4. I need the original record.

Any idea how to filter out duplicate records, using only queries, in this scenario? Any way to have one query supply parameters for another in Access? Or a way to iterate through a series of user_ids in a query, like with a FOR EACH type of loop?

Thanks,
-JoeyCode

Clunietp
Jul 5th, 2000, 10:55 PM
I did this in SQL 7

SELECT *
FROM Products
WHERE productid =
(SELECT TOP 1 MAX(CategoryID)
FROM categories)


this basically shows you to use TOP 1, and because you don't care which record, you don't need to specify any more criteria in your subquery....you could probably also use DISTINCT instead of TOP 1 as well....

I'm pretty sure Access 2000 supports the TOP keyword, but I'm not sure about Access 97

JoeyCode
Jul 8th, 2000, 12:24 AM
Clunietp, Access 97 does support TOP and DISTINCT and all that stuff, but none of those will help in my subquery. I get my MAX timestamp no problem with the query I included in my last message above, but then it matches two different records, both of which are returned.

If I used TOP, I'd get the same result: the subquery, using TOP, would give me a timestamp to match, and the main query would return the two records that match it. I need to be able to then somehow filter to get only one. This must be a tough one, because so far nobody I've asked (and I've asked a lot of people) can figure it out.

Thanks,
-JoeyCode

Clunietp
Jul 8th, 2000, 12:00 PM
So your subquery is just fine, but your main query is the one that gives you the multiple records?

I know for a fact that Access 97 supports Distinct and DistinctRow

JoeyCode
Jul 11th, 2000, 04:26 PM
Yes, Access 97 supports DISTINCT and DISTINCTROW. Did someone say different?

My subquery returns the max timestamp, but the main query returns ALL records that match it. Do you see what I'm getting at here? The subquery only grabs criteria for matching, if I have two or more records that match its result, that's what the main query returns. The question is, how do I filter out all but one? They're not identical records, so DISTINCT and DISTINCTROW will not work.

CGTS
Jul 12th, 2000, 03:34 AM
Here we go..................

This first SQL statement creates a base recordset to group the Max Timestamp with the User_ID. I named this query qryGroupMax
------------------------------------------------------------
SELECT tblScans.user_id, Max(tblScans.timestamp) AS MaxDate
FROM tblScans
GROUP BY tblScans.user_id;
------------------------------------------------------------

Then we use this next query to seperate just one record from that and bring forward the other fields (Station & Building).

Note I had to use the Last() function incase you have to timestamps exactly the same (User_ID + timestamp)
Note also that I had to use to inner joins to get the correct result set.

------------------------------------------------------------
SELECT qryGroupMax.user_id, qryGroupMax.MaxDate, Last(tblScans.station) AS LastOfstation, Last(tblScans.building) AS LastOfbuilding
FROM qryGroupMax INNER JOIN tblScans ON (qryGroupMax.MaxDate = tblScans.timestamp) AND (qryGroupMax.user_id = tblScans.user_id)
GROUP BY qryGroupMax.user_id, qryGroupMax.MaxDate;
------------------------------------------------------------

I obviously did this in Access queries, but it can be done in VB on two recordsets.

AKA
Jul 12th, 2000, 05:58 AM
Should not

SELECT TOP 1 QSA.user_id, QSA.timestamp AS MaxDate, QSA.station, QSA.building
FROM Q_SCANS_ALL AS QSA
WHERE QSA.timestamp = (SELECT max(QSA2.timestamp)
FROM Q_SCANS_ALL QSA2
WHERE QSA2.user_id = QSA.user_id);

do it ?

JoeyCode
Jul 12th, 2000, 06:13 PM
Clunietp and AKA, I don't want just one record, I want one record PER USER. Please read from the top to see what I'm after.

CGTS, the FIRST and LAST functions might just be exactly what I was looking for...thanks!

-JoeyCode

AKA
Jul 13th, 2000, 06:21 AM
:) Ok, I started to read the thread yeasterday and I forgot the start when reading today.

SELECT
QSA.user_id,
Last(QSA.timestamp) AS LastTimeStamp,
Last(QSA.station) AS LastStation,
Last(QSA.building) AS LastBuilding,
First(QSA_F.timestamp) AS FirstTimeStamp,
First(QSA_F.station) AS FirstStation,
First(QSA_F.building) AS FirstBuilding
FROM
tblScans AS QSA,
tblScans AS QSA_F
WHERE
QSA.timestamp = ( SELECT
max(QSA2.timestamp)
FROM
tblSCANS QSA2
WHERE
QSA2.User_Id = QSA.User_Id))
AND
QSA_F.timestamp = ( SELECT
min(QSA2.timestamp)
FROM
tblSCANS QSA2
WHERE
QSA2.User_Id = QSA.User_Id)
AND
QSA_F.user_id = QSA.User_ID
GROUP BY
QSA.user_id ;

Is my new answer to your problem. I do not realy know why it works but in my small Access test database it does. Please conferm if it works for you to. /AKA

JoeyCode
Jul 13th, 2000, 06:24 PM
That's a very clever solution, AKA. I'm wondering if using the QSA2 alias in both the min and max subqueries might be problematic. Also, would it be correct to say that you could use either only the FIRST function or only LAST for all the selections, since QSA has only MAX records and QSA_F has only MIN records?

Your SQL here makes me think of another solution...no need for subqueries or more than one reference to SCANS; just GROUP BY user_id, ORDER BY timestamp, then grab the FIRST and LAST fields, giving the min and max. The only thing about that is I believe there's a relationship between GROUP BY and ORDER BY that might be problem, but I'm gonna try it anyway.

Thanks for your help,
-JoeyCode

AKA
Jul 14th, 2000, 02:32 AM
I dont know if same alias in diffrent subqueries are a problem for SQL Server, I think that they belong to diffrent context and therfore can have the same name.

I belive it is correct to use either the FIRST or LAST for all selections, I used both just to test them since it was first time I used them.

Even better solution comming up ? Good ! Post it so I can learn more as I hope all of us do.

JoeyCode
Jul 17th, 2000, 08:22 PM
For those of you that are interested...

My idea of GROUPing by user_id, ORDERing by timestamp, then grabbing the FIRST and LAST records...it doesn't work. You apparently can't ORDER BY a field that's not included in the GROUP BY, in this case "timestamp", which makes it not possible to sort within the user_id groups and grab the FIRST and LAST like I hoped. And including timestamp in the GROUP BY will create separate groups for every unique timestamp and obviously won't yield the correct results. Too bad, I was hoping it would work.

However, I used a multiple subquery and FIRST function solution like AKA's suggestion above and it works like a charm. Thanks for your help everybody.

Now, look for a new thread about my problems with multiple LEFT JOINS...

-JoeyCode