Results 1 to 6 of 6

Thread: Help with Window Functions (SQLite 3.28)

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    1,597

    Help with Window Functions (SQLite 3.28)

    For some background - I've been wanting to learn how to use SQLite's new-ish Window Functions feature, and I have finally gotten around to it with a real-world candidate statement to work and compare against.

    I have a project with an SQL statement that works but is fairly slow - it takes about 3 seconds over a couple thousand records (even after adding/tweaking INDEXes). I think that using Window Functions might be a better approach, and in early tests I'm getting an order of a magnitude better performance using them (around 300ms). This is very promising - but the results I' m getting aren't quite correct.

    The SQLite Window Function documentation is in draft and still a bit sparse IMHO, so I'm struggling a bit.

    Let me explain the schema & results I'm look for:

    The table is defined as follows:

    Code:
    CREATE TABLE IF NOT EXISTS test (
    	number TEXT NOT NULL, 
    	createddate TEXT NOT NULL DEFAULT CURRENT_DATE,
    	approved INTEGER NOT NULL DEFAULT 0, 
    	approveddate TEXT,
    	typecode TEXT NOT NULL DEFAULT 'TYPE_0',
    	amount INTEGER NOT NULL,
    	effectiveid TEXT);
    number is a text field (like 001, 002, 003R1, etc...)

    createddate and approveddate hold the date-only portions of an iSO8601 datetime formatted value (e.g. 2020-06-01)

    approved is either 0 (unapproved) or 1 (approved)

    typecode is a string that identifies the "type" of record (e.g. "TYPE_A", "TYPE_B", etc...)

    amount is an unsigned integer number (e.g. -123, 345, etc...)

    effectiveid is an underscore joined string of typecode_((not approved)+1)_effectivedate_number - the ((not approved)+1) ensures that unapproved items are sorted after approved items ((not 0)+1)=2 in SQLite, ((not 1)+1)=1. effectivedate is either the createddate (when approved=0) or the approveddate (when approved=1). Example value: "TYPE_A_1_2020-06-01_002R1"

    The 4 sets of rules for getting the required results are as follows:

    I want the sum of all positive amount values (result #1) and negative amount values (result #2) that come before each record where the typecode and approved values are the same as the current record. Records are considered "before" the current record when the effectiveid value is < the current record's effectiveid value.

    I also want the sum of all positive amount values (result #3) and negative amount values (result #4) that come before each record where the typecode is the same for both the current record and window record, but the approved values are NOT the same (e.g. If a record has approved value of 1 then the sum will be of all previous records with an approved value of 0 (and vice versa)). The same rule as above governs what records are considered "before" the current record: that is, when the effectiveid value is < the current record's effectiveid value.

    I've been able to get correct results for #1 and #2 with the following SQL:

    Code:
    SELECT fullrs.approved, fullrs.amount, fullrs.typecode, fullrs.effectiveid,
    
    TOTAL(fullrs.amount)
    	FILTER ( WHERE
    			  amount > 0
    			) OVER window_previous as previous_positive_amount,
    
    TOTAL(fullrs.amount)
    	FILTER ( WHERE
    			  amount < 0
    			) OVER window_previous as previous_negative_amount 
    
    FROM test fullrs
    
    WINDOW 
    window_previous AS 
    	(PARTITION BY approved, typecode ORDER BY effectiveid RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW)
    
    ORDER BY typecode, effectiveid
    Results for the above exactly match my old, slow SQL statement but much faster (yay!).

    The tricky bit has been getting the correct results for #3 and #4. I've been trying different FILTERs, but I don't know if there's a way to filter the Window based on values from the full recordset - I haven't been able to get it to work at least. It also looked promising to try the <expr> part of the PRECEDING frame spec portion - I think you can put a SELECT statement in there to affect the records that are included in the Window, but every time I tried I received the following error: frame starting offset must be a non-negative number.

    At this point, both my intelligence and Google-fu have failed me, so I'm reaching out here for help. Any pushes in the right direction would be greatly appreciated! Example database included here:

    windowtest.zip
    Last edited by jpbro; Jun 12th, 2020 at 12:08 PM.

  2. #2

    Thread Starter
    Frenzied Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    1,597

    Re: Help with Window Functions (SQLite 3.28)

    Apologies - I made a mistake with my requirements for #3 and #4, so if you read those requirements please note that they have changed.

  3. #3
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    2,482

    Re: Help with Window Functions (SQLite 3.28)

    don't know if this is any help but....

    I once had a difficult task creating a ..what should I call it... Identification Number for Camping Products
    they had a smilar value like your..
    effectiveid is an underscore joined string of typecode_((not approved)+1)_effectivedate_number - the ((not approved)+1) ensures that unapproved items are sorted after approved items ((not 0)+1)=2 in SQLite, ((not 1)+1)=1. effectivedate is either the createddate (when approved=0) or the approveddate (when approved=1). Example value: "TYPE_A_1_2020-06-01_002R1"
    it looked like : 12/24V_2018-08-13-AB_1244
    which was for a 12/24V CoolBox
    Production start 2018-08-13
    Produced in Factory : AB_1244
    and more stuff I can't remember

    so I broke that down into Integer or Long values

    I started with the Function of the Coolbox, there was
    1 = 12V
    2 = 12/24Volt
    3= 12/24/230Volt
    4= 12/24/Gas
    ......

    the Production Factory:
    1 = AB_1244
    2 = CS_124
    .....

    get where I'm going ?

    so the Generalnumber for a Product in the end looked like
    2201808131

    instead of
    12/24V_2018-08-13-AB_1244

    so anybody looking for 12/24V products just needed to enter a 2

    in your case..
    55 = unapproved
    66= approved
    77= pending
    .....

    101 = TYPE_A
    102 = TYPE_B
    103 = TYPE_C
    etc...


    you will end up with a Number without any Underscores
    istead of = TYPE_A_1_2020-06-01_002R1


    hth
    Last edited by ChrisE; Jun 13th, 2020 at 03:13 AM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  4. #4

    Thread Starter
    Frenzied Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    1,597

    Re: Help with Window Functions (SQLite 3.28)

    Hi Chris, thank you for the response.

    I like the idea of creating an integer only sort idea - I imagine it would improve performance so I'll give it a try against my old/slow SQL, but it's not exactly what I need to solve the totaling problem I have with window functions.

    I've gotten a bit further with the window function in that I've found a way to get my results right for either approved items (but not unapproved items), or vice versa.

    For example, the following SQL returns the right results for approved items (sum of previous unapproved items) but not unapproved items (it also returns the sum of unapproved items, as opposed to the sum of approved items):

    Code:
    SELECT 
    
    TOTAL(amount) 
    	FILTER ( WHERE
    			  NOT approved AND
    			  amount > 0
    			) OVER window_previous_inverseapproved as previous_positive_inverseapproved
    
    FROM test fullrs 
    
    WINDOW 
    window_previous AS 
    	(PARTITION BY typecode, approved ORDER BY effectiveid RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW),
    
    window_previous_inverseapproved AS 
    		(PARTITION BY typecode 
    		 ORDER BY effectivedate, effectiveid 
    		 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW)
    Similarly, the following SQL returns the correct results for unapproved rows (sum of previous approved rows), but incorrect results for approved rows (incorrectly returning the sum of the amounts of approved rows instead of correctly returning the sum of amounts of unapproved rows):

    Code:
    SELECT 
    
    TOTAL(amount) 
    	FILTER ( WHERE
    			  approved AND
    			  amount > 0
    			) OVER window_previous_inverseapproved as previous_positive_inverseapproved
    
    FROM test fullrs 
    
    WINDOW 
    window_previous AS 
    	(PARTITION BY typecode, approved ORDER BY effectiveid RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW),
    
    window_previous_inverseapproved AS 
    		(PARTITION BY typecode 
    		 ORDER BY effectivedate, effectiveid 
    		 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW)
    I've been trying various IIF() conditions, FILTERs, etc. but so far I've been unable to get the sum of the amount of all previously unapproved rows for approved rows AND the sum of the amount of all previously approved rows for unapproved rows - all in a single column. I'll keep at though, and hopefully figure it out. Thanks for your input!
    Last edited by jpbro; Jun 14th, 2020 at 12:23 PM.

  5. #5
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    2,482

    Re: Help with Window Functions (SQLite 3.28)

    Hi,

    well I don't work with SQLite, so I am probably the wrong person in this Thread

    this is how a IIF would look like in Access
    Code:
    SELECT Customers.CustomerID, Customers.CompanyName, IIf([Contact Title]='Sales Representative' And [City]='London',"Yes","No") AS InCity, Sum([Order Details].UnitPrice) AS SummevonUnitPrice
    FROM (Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID) INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID
    GROUP BY Customers.CustomerID, Customers.CompanyName, IIf([Contact Title]='Sales Representative' And [City]='London',"Yes","No");
    or a PIVOT with Switch
    Code:
    TRANSFORM Count(Orders.OrderID) AS CountOfOrderID
    SELECT Customers.Country
    FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
    GROUP BY Customers.Country
    PIVOT Switch([CompanyName] Like "[A]*","A",
    [CompanyName] Like "[G-Og-o]*","G-O",
    [CompanyName] Like "[P-Zp-z]*","P-Z",True,"Other") 
    In ("A","G-O","P-Z","Other");
    both samples you can try with the Northwind Database


    but you can also execute multiple SQL Statements within ADO to sum diffrrent Values (True /False)
    I can take a look if you supply some Data and what you expect as the Output

    hth
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  6. #6

    Thread Starter
    Frenzied Member
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    1,597

    Re: Help with Window Functions (SQLite 3.28)

    Hi Chris - sorry for taking so long to get back to you on this, and thank you for all your help.

    I've temporarily given up on using Window functions, as I just couldn't get them to work. Not a total waste, because I learned a lot from the attempt, and I'll be able to use them in other parts of my application.

    What I've found is that there were 2 areas in my old/slow SQL that were causing it to perform poorly.

    First, I was using a custom collation function which was slowing things down significantly, so I've switched to pre-computing my collation values and then doing a regular ASC sort on those values. This saves me about 1 - 1.5s on the query.

    Second, the SQLite DATE() function is a bit slow (not so noticeable until you are processing hundreds of thousands of records though). I've reworked my SQL to not need it, saving between 500ms and 1s.

    Once the above changes were made, my old approach performs adequately (<1s vs. 3s).

    Thanks again for taking a look at this!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width