dcsimg
Results 1 to 6 of 6

Thread: [RESOLVED] Scalar Function Performance

  1. #1

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    32,504

    Resolved [RESOLVED] Scalar Function Performance

    We have a tendency to build views of views of views.....of views of views....of views...and so on. For one view that we were putting together, there were a pair of nested CASE statements that looked at whether a different joined in view returned a null. That was okay, and the performance was pretty good. However, we had a FAR uglier computed field we wanted in that view, and I felt that it would result in a view design that would be a nightmare to maintain due to how utterly ugly it would end up, especially since we already had a nested CASE statement, which is ugly enough.

    Therefore, I turned that nested CASE statement into one scalar valued function (SVF), with the intention of turning the far uglier item into a different SVF. However, when I tested out the first SVF, the performance was worse than when using the CASE statement. This surprised me, because the SVF was, on the face of it, a bit better than the view that the CASE statement used.

    The view that the CASE statement used looked roughly like this:

    SELECT aNumber, COUNT(DISTINCT aNumber) As Something
    FROM SomeTable INNER JOIN SomeOtherTable
    ON somethingOrOther INNER JOIN YetAnotherTable
    ON YetAnotherThing
    WHERE (Some large set of conditions ORd together)

    The SVF was using this to get similar information:

    SELECT Top(1) aNumber FROM...

    Everything from the FROM on was identical between the two, except that there was an additional WHERE clause that restricted the selection based on the argument passed to the SVF, which should only have reduced the selection. So, the difference between the view and the SVF was that the view returned two columns, and MANY rows, while the SVF only got the top 1 with a more selective WHERE.

    The SVF was also doing the CASE, which looked like this:

    (CASE WHEN (ThatView IS NULL) THEN (CASE WHEN (ANearlyIdenticalView IS NULL) THEN 'Natural' ELSE 'Unknown' END) ELSE 'Hatchery' END) AS Origin

    So, the SVF actually looked like this:

    Set @Return = (SELECT CASE WHEN (SELECT Top (1) aNumber...etc.) IS NULL THEN (CASE WHEN (SELECT Top(1) aDiffNumber...etc.) IS NULL THEN 'Natural' ELSE 'Unknown' END) ELSE 'Hatchery')

    What I'm trying to get at here, is that the CASE in the view of views made use of two other views. In those two other views, I was really only asking whether there were any rows returned, but was getting two fields, anyways, one of which was a COUNT. In the SVF, I was getting rid of that COUNT(), as it wasn't all that useful, and was only getting the Top (1), if there was one.

    The SVF was also combining the two views into a single function, which used a CASE to determine the final result. However, the SVF ran a bit slower than using the view of a view of a view, even though the SQL in the SVF was simpler than the SQL in the view of views. Furthermore, the SVF could add a far more restrictive condition for its set of WHERE conditions, that should have resulted in far fewer rows (one rather than thousands or tens of thousands).

    I'm not sure that I explained that clearly, but I'm trying to get at why an SVF would run slower than the same thing in a view of views?
    My usual boring signature: Nothing

  2. #2
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    6,982

    Re: Scalar Function Performance

    It's hard to be sure without sitting at your desk and unpicking the whole thing personally but I suspect your problem is right here:-
    the view returned two columns, and MANY rows, while the SVF only got the top 1 with a more selective WHERE.
    At first you might think that's better but it's actually worse. While the view might do more work internally, the function is actually much harder for the engine to consume externally.

    The function must be called for every single row in the consuming dataset and therefore generates an underlying cursor on the consuming dataset. This is know as a RBAR (Row By Agonising Row) approach and stuffs performance pretty bad.

    The view, on the other hand, gets resolved once and is joined to your consuming dataset in a set based manner. In fact, while you may look at the view in isolation and think it's returning many rows, it's probably only returning a few when joined to by the consuming query. That's far more efficient for SQL Server to process.

    As a rule of thumb (and there's always exceptions) you should avoid ever using scalar functions in joins or filters (I know you didn't, I'm just being a completist). And you should favour joining (or cross applying) to table functions and views if you need a result in the select list if the dataset is large and the number of distinct values being selected is small.



    Edit> As an additional tip, Select top 1 may not be quicker that the select count (though it's unlikely to be worse). If you're using an Order By it certainly isn't because it would have to resolve the entire dataset. Without an Order by I'm less sure. I think it might still resolve the entire dataset.

    If all you're checking is for the existence of a record (I think that's all you're doing) then the best way to do it is use the EXISTS clause. This is explicit so the query engine knows it does not have to resolve the whole dataset. If you've got the right indexes in place it'll resolve it straight off the stats without needing to touch the table at all.




    Edit2> Thought I'd add that the View of Views architecture is bad and you should get rid of it if you can. It may be performing better than the scalar function but it's still probably compromising the engine's ability to build a decent query plan. If you're in a position to it would be best to flatten those out to as few levels as possible. Ideally just a straight query against the raw tables

    Views are great and allow you to re-use logic which is, of course, a good thing. But once you more than one or two levels deep the engine start to really struggle to optimise them. It a trade off between performance and DRY principles.
    Last edited by FunkyDexter; Sep 24th, 2018 at 04:47 AM.
    You can depend upon the Americans to do the right thing. But only after they have exhausted every other possibility - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  3. #3
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    31,829

    Re: Scalar Function Performance

    Yeah, it seems like it would be counterintuitive, but funky pretty much hit it on the head... I've run into similar issues when working with SVFs as well. Oddly, converting them to TVFs, adding a little bit of info and then joining to it actually ends up improving performance - sometimes. The key I find is remembering that SQL works best on set-based data... it's really bad with row-based data operations, so when you can do operations that work on larger datasets than individual rows, the better, more efficient it's going to be.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  4. #4

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    32,504

    Re: Scalar Function Performance

    Yeah, that probably covers the situation correctly. I can't do anything about indexes in this database...at least I'm pretty sure I can't...safely, but that's a political matter.

    The views of views is something we tend to do a whole lot of. Partly it's convenience, cause some people know that better. The issue I run into is that combining things into one massive query turns out to be kind of horrible looking. In this case, that's probably not true....yet. We just never seem to reach an end as to what is actually needed.
    My usual boring signature: Nothing

  5. #5
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    6,982

    Re: [RESOLVED] Scalar Function Performance

    The views of views is something we tend to do a whole lot of.
    It's a tricky balancing act. Views certainly make your logic more "bite size" and more re-usable. Table Valued functions do the same. And mostly they don't cause problems as the optimiser can in-line the underlying query to produce an efficient execution plan.

    Trouble is, there comes a point where the optimiser just can't get it's head around the complexity of it and produces an inefficient plan, at that point the performance drops off a cliff. You can't really predict when that points going to come. Personally I reckon that if your burying your views/tvfs more than a level deep (or two at the most) then you're starting to ask for trouble. It's just a rule of thumb though and I've seen them go deeper with no issues.

    The one thing I would watch for is when you have a view based on a table and, and in your consuming query you join that view to the same table to grab some extra info. That introduces a new instance of the table (or at least a subset of it) with a join for something that could have been resolved from a single instance. One of these occurring in a view probably isn't too bad but as the number increases the complexity ramps up quick. It's probably not the most damaging thing for performance but it's a really easy one to spot.
    You can depend upon the Americans to do the right thing. But only after they have exhausted every other possibility - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  6. #6

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    32,504

    Re: [RESOLVED] Scalar Function Performance

    Looks like we're sticking with views of views. I brought up the subject, but the view of my colleagues in this matter is that they like views, so views it will be unless it causes real issues for us. Right now, it is not.
    My usual boring signature: Nothing

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width