[RESOLVED] Abnormal Unrolling

Printable View

Apr 17th, 2024, 10:23 AM
Shaggy Hiker

[RESOLVED] Abnormal Unrolling

I can't believe I'm asking this, but here it is: We have a normalized database where somebody insists that it would be better if it were somewhat de-normalized. Everybody else thinks that's a bad idea, and it's not going to happen, but I thought there might be an intermediate solution to have it both ways, and that's what this question is about.

Suppose two tables:

Table 1:

PK
SomeStuff
SomeCount

Table 2:
PK
FK
OtherStuff
MoreOtherStuff.

If I had a query with a LEFT OUTER JOIN, it could look like this:

SELECT SomeStuff,SomeCount,OtherStuff,MoreOtherStuff
From Table1 LEFT OUTER JOIN Table 2 ON Table1.PK = Table2.FK

This will do almost exactly what the person who wants the tables denormalized wants, with one exception.

The SomeCount field in Table 1 could be some integer of 1 or greater. Quite often, if the count is > 1, then there will be no records in Table 2, but there MIGHT be. If I could say for certain that whenever SomeCount > 1, then there will be no record in table 2, then the LEFT OUTER JOIN query would return only one record for each record in Table 1, and a Sum of SomeCount will be correct.

I can't say that, though, so SomeCount could be 3, and the LEFT OUTER JOIN query might result in two or three rows, each with SomeCount = 3, and with whatever data from the two or three rows in Table 2. That would mean that a Sum of SomeCount would be different from just summing the SomeCount in Table 1. In other words:

SELECT SUM(SomeCount) FROM Table1

Would not return the same value as:

SELECT SUM(SomeCount)
From Table1 LEFT OUTER JOIN Table 2 ON Table1.PK = Table2.FK

This would bother me not at all, but it may bother the one guy who insists on denormalizing the database. So, what I'm asking is whether or not there is some reasonable approach that would essentially 'unroll' the case where the LEFT OUTER JOIN results in more than one record?

That's a more difficult problem than it might appear at first, so I don't think there is a REASONABLE way to do this (and it's too stupid a need to waste any time on, when it could be solved so much easier through code). The issue is that if SomeCount > 1, then there need not be SomeCount records in Table 2. In other words, if SomeCount were 5, there could be 3 records in Table 2, so the unrolled view would have to end up looking like this:

SomeStuff,1,OtherStuff,MoreOtherStuff
SomeStuff,1,OtherStuff,MoreOtherStuff
SomeStuff,1,OtherStuff,MoreOtherStuff
SomeStuff,2,Null,Null

Where that last record has a count of 2, as there are two records with no matching records in Table 2.

Is there a simple way to do this?
Apr 17th, 2024, 10:47 AM
techgnome

Re: Abnormal Unrolling

Either a CTE that gets the sum independent of the other table, or a sub query that does the same sounds like the way to solve it ... maybe? Also, it sounds like he wants a Data Warehouse which would have everything denormalized and flattened. again, maybe?

-tg
Apr 17th, 2024, 03:07 PM
Shaggy Hiker

Re: Abnormal Unrolling

Yeah, that's kind of where I'm coming down, also. He'll be the only person using that monstrosity, and I'm not quite sure what he's using it for. As far as I know, he never needs both the count AND the information from table 2 at the same time, so I'm inclined to offer up the LEFT OUTER JOIN with the caveat that he can't sum the Count on that view. If he wants the total count, he can sum Table 1 directly. I don't believe he ever needs both.

The whole thing is absurd to begin with.
Apr 18th, 2024, 01:18 AM
Zvoni

Re: Abnormal Unrolling

Do you have any concrete examples? I'm trying to wrap my mind around how a correct result might look like.

But i tend in tg's direction, too: Sum first, Join second

EDIT: I think i figured it out, what it should look like (missed your last sample)
Give me a bit of time to think on it

I take it, it's MS SQL-Server?

EDIT2: Done in SQLite. Couldn't find a fiddle with MS-SQL

Code:

WITH T2 AS (SELECT FK, OtherStuff, MoreStuff, ROW_NUMBER() OVER(PARTITION BY PK ORDER BY PK) AS RNPK, ROW_NUMBER() OVER(PARTITION BY FK ORDER BY PK) AS RNFK FROM Table2) SELECT T1.SomeStuff, T2.RNPK As SomeCount, T2.OtherStuff, T2.MoreStuff FROM Table1 AS T1 LEFT JOIN T2 ON T1.PK=T2.FK UNION ALL SELECT T3.SomeStuff, T3.SomeCount-COALESCE(MAX(T2.RNFK), 0), Null, Null FROM table1 AS T3 LEFT JOIN T2 ON T3.PK=T2.FK GROUP BY T3.SomeStuff HAVING T3.SomeCount-COALESCE(MAX(T2.RNFK), 0)>0

Returns

SomeStuff SomeCount OtherStuff MoreStuff

Shaggy 1 Hiker1 Burp

Shaggy 1 Hiker2 Fart

Shaggy 1 Hiker3 Gulp

Shaggy 2

If you change SomeCount to 3 in T1, the last line is not returned

EDIT3: Found a fiddle: https://dbfiddle.uk/IYmGLyJG

had to change the query a bit, since MSSQL is stricter than SQLite

Tables

Code:

CREATE TABLE table1 ( PK INTEGER, SomeStuff VARCHAR(255), SomeCount INTEGER ); CREATE TABLE table2 ( PK INTEGER, FK INTEGER, OtherStuff VARCHAR(255), MoreStuff VARCHAR(255) );

Inserts

Code:

INSERT INTO table1 (PK, SomeStuff, SomeCount) VALUES (1, 'Shaggy', 5); INSERT INTO table1 (PK, SomeStuff, SomeCount) VALUES (2, 'DDay', 3); INSERT INTO table2 (PK, FK, OtherStuff, MoreStuff) VALUES (1, 1, 'Hiker1', 'Burp'); INSERT INTO table2 (PK, FK, OtherStuff, MoreStuff) VALUES (2, 1, 'Hiker2', 'Fart'); INSERT INTO table2 (PK, FK, OtherStuff, MoreStuff) VALUES (3, 1, 'Hiker3', 'Gulp');

Query

Code:

WITH T2 AS (SELECT PK, FK, OtherStuff, MoreStuff, 1 As RNPK FROM Table2), T4 AS (SELECT FK, COUNT(PK) AS RNFK FROM T2 GROUP BY FK) SELECT T1.SomeStuff, COALESCE(T2.RNPK, T1.SomeCount) As SomeCount, T2.OtherStuff, T2.MoreStuff FROM Table1 AS T1 LEFT JOIN T2 ON T1.PK=T2.FK UNION ALL SELECT T3.SomeStuff, T3.SomeCount-T4.RNFK, Null, Null FROM table1 AS T3 INNER JOIN T4 ON T3.PK=T4.FK WHERE T3.SomeCount-T4.RNFK>0 ORDER BY T1.SomeStuff, SomeCount

If you need an explanation how it works, just give a shout
Apr 18th, 2024, 09:15 AM
Shaggy Hiker

Re: Abnormal Unrolling

Wow, you even got the MoreStuff items right, and I didn't even include them:eek::D

I might just see whether or not that makes everybody happy, whether it is a bridge too far, or whether people just want to fight about this for the sake of it. Won't be this week, though.
Apr 18th, 2024, 03:35 PM
Shaggy Hiker

Re: Abnormal Unrolling

Yeah, that works. Clever approach. After yet another meeting, I was asked to put it together so that we can whip it out if needed.
Apr 19th, 2024, 01:18 AM
Zvoni

Re: Abnormal Unrolling

Quote:

Originally Posted by Shaggy Hiker

Yeah, that works. Clever approach. After yet another meeting, I was asked to put it together so that we can whip it out if needed.

Just don't Burp and Fart during one of those meetings :D
Apr 19th, 2024, 03:50 AM
yereverluvinuncleber

Re: [RESOLVED] Abnormal Unrolling

Rawhide!
Apr 19th, 2024, 04:26 AM
wqweto

Re: [RESOLVED] Abnormal Unrolling

Here is my attempt

Code:

SELECT SUM(t2.SomeCount) FROM ( SELECT t2.FK , 1 AS SomeCount , t2.OtherStuff , t2.MoreStuff FROM table2 t2 UNION ALL SELECT t1.PK , MAX(t1.SomeCount) - COUNT(t2.PK) AS SomeCount , NULL AS OtherStuff , NULL AS MoreStuff FROM table1 t1 LEFT JOIN table2 t2 ON t2.FK = t1.PK GROUP BY t1.PK HAVING MAX(t1.SomeCount) - COUNT(t2.PK) <> 0 ) t2

And here is my reasoning:
- I'm adding a new column SomeCount to t2 with a constant value of 1.
- I'm adding "missing" rows to t2 with enough SomeCount so that for each t2.FK there is enough SUM(t2.SomeCount) to match the total in t1.SomeCount. This is what the second SELECT after UNION ALL does.
- On new rows the values of OtherStuff, MoreStuff, CoolStuff, etc. are NULLs
- I'm using basic ANSI SQL-92 -- no CTEs, no windowing functions so this should work on MSSQL 2000, MS Access, dBase (probably:-))

cheers,
</wqw>
Apr 19th, 2024, 05:23 AM
Zvoni

Re: [RESOLVED] Abnormal Unrolling

The reason for my way (since wqw explained his, i'll add my explanation to it, since someone else might learn something from it):

1) CTE --> for DBMS not supporting CTE's (e.g. MySQL5.7 and lower) it's easy to resolve the CTE's as nested Queries within its according JOIN
2) T2: I like to "prepare" the Dataset beforehand
3) T4: The reason why T4 selects from T2 (instead from table2 directly) is:
If you "introduce" a Filter ("WHERE ....") on T2 (whether directly within T2 or in the upper half of the UNION, though there within the ON-Clause of the LEFT JOIN --> NOT IN A WHERE-CLAUSE!!),
that Filter propagates automatically to T4! No need to filter in multiple places
4) In the upper half of the UNION i use a LEFT JOIN since we might get an unequal "count" of childrecords compared to "SomeCount". The Coalesce ensures that if NO childrecords exist to return the "SomeCount" from the Parentrecord
5) In the lower half of the UNION i use an INNER JOIN, since the "overhanging SomeCount" is only applicable, if there actually exist Childrecords. It's a simple subtraction with no need for GROUP BY
6) The only aggregation with a GROUP BY is very simple within T4 --> Frankly, Aggregations and its GROUP BY's have driven me barmy in the past, so i try to keep them as simple as possible, nevermind that i loathe Aggregation/Grouping in the "main" Query

EDIT: @1
here without CTE's
https://dbfiddle.uk/MOEO-0I_

Code:

SELECT T1.SomeStuff, COALESCE(T2.RNPK, T1.SomeCount) As SomeCount, T2.OtherStuff, T2.MoreStuff FROM Table1 AS T1 LEFT JOIN (SELECT PK, FK, OtherStuff, MoreStuff, 1 As RNPK FROM Table2) AS T2 ON T1.PK=T2.FK UNION ALL SELECT T3.SomeStuff, T3.SomeCount-T4.RNFK, Null, Null FROM table1 AS T3 INNER JOIN (SELECT FK, COUNT(PK) AS RNFK FROM (SELECT PK, FK, OtherStuff, MoreStuff, 1 As RNPK FROM Table2) GROUP BY FK) AS T4 ON T3.PK=T4.FK WHERE T3.SomeCount-T4.RNFK>0 ORDER BY T1.SomeStuff, SomeCount

Same results,
.... BUT: As you can see, i execute the Query on table2 twice!
Apr 19th, 2024, 08:10 AM
wqweto

Re: [RESOLVED] Abnormal Unrolling

Quote:

Originally Posted by Zvoni

6) The only aggregation with a GROUP BY is very simple within T4 --> Frankly, Aggregations and its GROUP BY's have driven me barmy in the past, so i try to keep them as simple as possible, nevermind that i loathe Aggregation/Grouping in the "main" Query

You cannot possibly do it without aggregation here and using GROUP BY in the "main" is most probable to use any indexing on the base table compared to windowed functions on CTEs on derived tables on nested views.

cheers,
</wqw>
Apr 19th, 2024, 09:21 AM
Shaggy Hiker

Re: [RESOLVED] Abnormal Unrolling

One thing this made me realize is that I've been using something like CTEs for a long time. What we were doing was views of views. The query analyzer doesn't do a good job with those, because it gives up on the optimizing at some point. I came in late to the party, and have stuck with what people were comfortable with, but CTEs would have allowed the query analyzer to work more efficiently. However, since the folks I work with tend to be happier in Access than in SSMS, I'm not sure that they'd be willing to come around to using CTEs. I like to write SQL freehand, while they prefer 'designers' of one sort or another, and try to avoid writing SQL directly. I'm not sure you can write a CTE in SSMS, but you sure can't in Access.
Apr 22nd, 2024, 02:53 AM
Zvoni

Re: [RESOLVED] Abnormal Unrolling

Quote:

Originally Posted by Shaggy Hiker

I'm not sure you can write a CTE in SSMS,

Yes, you can (to quote Obama :-))

Quote:

but you sure can't in Access.

No, you can't (to quote Trump *?*)
Apr 22nd, 2024, 09:29 AM
Shaggy Hiker

Re: [RESOLVED] Abnormal Unrolling

That was poorly stated on my part. I know you can write a CTE in SSMS. What I meant to say was that I'm not sure you can write a CTE in SSMS without writing SQL. If all you do is drag tables into a designer, fiddle with joins, etc., without ever touching the raw SQL, is it still possible to create a CTE?
Apr 22nd, 2024, 11:06 AM
Zvoni

Re: [RESOLVED] Abnormal Unrolling

Quote:

Originally Posted by Shaggy Hiker

That was poorly stated on my part. I know you can write a CTE in SSMS. What I meant to say was that I'm not sure you can write a CTE in SSMS without writing SQL. If all you do is drag tables into a designer, fiddle with joins, etc., without ever touching the raw SQL, is it still possible to create a CTE?

Ahh….. don’t think so.

OTOH, never used those „click and dragdrop“-thingies in any db-client.
always used SQL, so I shouldn’t talk :)

SomeStuff	SomeCount	OtherStuff	MoreStuff
Shaggy	1	Hiker1	Burp
Shaggy	1	Hiker2	Fart
Shaggy	1	Hiker3	Gulp
Shaggy	2