Results 1 to 18 of 18

Thread: [RESOLVED] Puzzling Stored Procedure

  1. #1

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Resolved [RESOLVED] Puzzling Stored Procedure

    It's not a good title, but then again, I'm having a hard time even classifying this problem. I have a stored procedure that is supposed to return a string called an LLID that identifies a stream. The way this works is that we have a database with thousands of points along a series of streams. For any new waypoint (the arguments to the stored procedure), find the existing waypoint that the new waypoint is closest to using the Haversine formula. It is assumed that the stream that the closest existing waypoint is on, will be the same stream that the waypoint in question is on. Naturally, this could be a problem at the confluence of two streams, but that edge case isn't relevant to this question.

    For the most part, this works correctly, but occasionally it does not, and in a very puzzling fashion. Here's the stored procedure:

    Code:
    SET NOCOUNT ON;
    	SELECT @Result = (SELECT Top(1) STRM_Survey.LLID 
            FROM STRM_Survey 
            INNER JOIN STRM_Survey_Waypoints 
              ON STRM_Survey.SurveyID = STRM_Survey_Waypoints.SurveyID 
            WHERE @Latitude IS NOT Null 
            AND @Longitude IS NOT Null  
            AND STRM_Survey_Waypoints.Latitude = @Latitude 
            AND STRM_Survey_Waypoints.Longitude = @Longitude)
    	
    	IF @Result IS Null 
    	BEGIN
    		SELECT @Result = (SELECT TOP(1) STRM_Survey.LLID 
                    FROM STRM_Survey INNER JOIN 
    			(SELECT TOP(1) SurveyID, 
    			MIN(( 6371 * acos( cos( radians(@Latitude) )  
    			  * cos( radians( STRM_Survey_Waypoints.Latitude ) ) 
    			  * cos( radians( @Longitude ) - 
                              radians(STRM_Survey_Waypoints.Longitude) ) 
                              + sin( radians(@Latitude) ) 
    			  * sin(radians(STRM_Survey_Waypoints.Latitude)) ) )) AS Res 
                      FROM STRM_Survey_Waypoints 
    	          WHERE @Latitude IS NOT Null 
                      AND @Longitude IS NOT Null  
                      AND STRM_Survey_Waypoints.Latitude 
                         BETWEEN @Latitude - (20/111.045)
                         AND @Latitude + (20/111.045) 
                      AND STRM_Survey_Waypoints.Longitude 
                        BETWEEN @Longitude - (50/(111.045 * COS(RADIANS(@Latitude)))) 
                        AND @Longitude + (50/(111.045 * COS(RADIANS(@Latitude))))
    	          GROUP BY SurveyID ORDER BY res) AS A 
                     ON STRM_Survey.SurveyID = A.SurveyID);
    	END
    
    	RETURN 0
    That's a bit ugly, but mostly because the Haversine formula is ugly. It can be simplified down to this, as I'm quite confident that the Haversine implementation is correct:

    Code:
    SET NOCOUNT ON;
    	SELECT @Result = (SELECT Top(1) STRM_Survey.LLID 
                FROM STRM_Survey 
                INNER JOIN STRM_Survey_Waypoints ON STRM_Survey.SurveyID = 
                               STRM_Survey_Waypoints.SurveyID 
                WHERE @Latitude IS NOT Null 
                AND @Longitude IS NOT Null  
                AND STRM_Survey_Waypoints.Latitude = @Latitude 
                AND STRM_Survey_Waypoints.Longitude = @Longitude)
    	
    	IF @Result IS Null 
    	BEGIN
    		SELECT @Result = (SELECT TOP(1) STRM_Survey.LLID 
                    FROM STRM_Survey INNER JOIN 
    			(SELECT TOP(1) SurveyID, 
    			 MIN(DistBetweenTwoWaypoints) AS Res
                             FROM STRM_Survey_Waypoints 
    			 WHERE @Latitude IS NOT Null 
                             AND @Longitude IS NOT Null  
                             AND STRM_Survey_Waypoints.Latitude BETWEEN 
                               @Latitude - (20/111.045) 
                               AND @Latitude + (20/111.045) 
                            AND STRM_Survey_Waypoints.Longitude BETWEEN 
                               @Longitude - (50/(111.045 * COS(RADIANS(@Latitude)))) 
                               AND @Longitude + (50/(111.045 * COS(RADIANS(@Latitude))))
    			  GROUP BY SurveyID ORDER BY res) AS A 
                              ON STRM_Survey.SurveyID = A.SurveyID);
    	END
    
    	RETURN 0
    The first part checks for an exact match between the new waypoint and an existing waypoint, because that would skip all the nasty math, so it's worth a quick check.

    I was thinking that first part might be a clue to the issue, but it doesn't appear to be.

    The issue is that I have a test problem in a program that calls this SP for a dozen waypoints scattered along a stream. For 10 of the waypoints, the correct stream is returned, but for two of them, an incorrect stream is returned.

    If that's all there was, it would be a simple enough bug, but the problem gets strange from here. When I created the test data, I took three legitimate waypoints and copied them four times to create a dozen waypoints. I then changed the names (which aren't used in any way, so that doesn't matter) of the waypoints and altered the latitude or longitude of the waypoints by a trivial amount. For example, between a working point and a non-working point, the longitudes are the same, while the latitudes are:

    Working: 44.01617797
    Not Working: 44.01617787

    That's not much of a difference. I can barely see the difference in Google Maps.

    The next point is that these waypoints are nowhere near the confluence of the two rivers. In fact, the incorrect river is a few hundred meters away.

    Since I quadruplicated the three waypoints, there are four waypoints that are virtually the same. One is in the DB, so it will perfectly match an existing waypoint and will get that LLID. None of the other three perfectly match an existing waypoint, and two of them return the incorrect LLID, while the fourth returns the correct LLID. The first two differ slightly in latitude, the fourth differs slightly in longitude, but they are all so close together as to be virtually identical.

    What's truly fascinating is that, when anything ever fails with the table that the LLID values come from (which isn't referenced in that SP), then the same stream is always returned. Any failure returning streams returns that one stream, and the reason has never been discovered. It's become some kind of crazy default. The LLID is neither the first nor the last, it sorts neither first nor last, the name is neither first nor last. I also can't tell whether this is a case of that mystery default, or not.
    My usual boring signature: Nothing

  2. #2
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Puzzling Stored Procedure

    SELECT TOP 1 needs an ORDER BY, imo. Always needs one...

    Do you have one that would make sense?

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  3. #3

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Further testing makes it even more peculiar. I took out the inner portion of the query, which is this part:

    Code:
    (SELECT TOP(1) SurveyID, 
    			MIN(( 6371 * acos( cos( radians(@Latitude) )  
    			  * cos( radians( STRM_Survey_Waypoints.Latitude ) ) 
    			  * cos( radians( @Longitude ) - radians(STRM_Survey_Waypoints.Longitude) ) + sin( radians(@Latitude) ) 
    			  * sin(radians(STRM_Survey_Waypoints.Latitude)) ) )) AS Res FROM STRM_Survey_Waypoints 
    			  WHERE @Latitude IS NOT Null 
                              AND @Longitude IS NOT Null  
                              AND STRM_Survey_Waypoints.Latitude 
                                 BETWEEN @Latitude - (20/111.045) 
                                 AND @Latitude + (20/111.045) 
                              AND STRM_Survey_Waypoints.Longitude 
                                 BETWEEN @Longitude - (50/(111.045 * COS(RADIANS(@Latitude)))) 
                                 AND @Longitude + (50/(111.045 * COS(RADIANS(@Latitude))))
    			  GROUP BY SurveyID ORDER BY res)
    Naturally, this returns a single Survey ID. The outer query joins that Survey ID to a Survey so that I can get the LLID from the Survey.

    I took out this part of the query and ran it on it's own. I removed the checks for Null in the WHERE clause, because there weren't any arguments, and I hard coded in the same lat and lon that were returning the wrong stream. This returned a single Survey ID, as it should. When I looked at that Survey ID, it had the right LLID, not the wrong one.

    I then took that whole part of the query from the SP and made the same change. Sure enough, it produced the right LLID.

    So, I'm baffled. The SP has two parts. The first part should return no records for the test data, and if separated out into its own query with the test data appended directly into it, it does produce no records. The second part should return a certain value, and if separated out into its own query with the test data appended directly into it, it does produce the right value.

    And yet, despite each part doing the right thing, the SP produces the wrong value. Better yet, I was able to get it down to an incredibly fine level. A difference of 1 in the sixth decimal place for Latitude, and none in longitude, will cause the incorrect value to be returned in the SP, whereas if the Latitude is off by 1 in the sixth place, then altering the longitude by 2 in the fourth decimal place will get the thing right.

    Basically, it almost looks like it is doing what it should. If you move the point a bit away from where it should be, it should snap to a different creek, but the changes I'm making would shift the point by a meter or two, at most, and it's jumping to a stream that is several hundred meters away. Even worse, if I run the queries directly, not together as part of an SP, they return the correct values.

    What the heck????
    My usual boring signature: Nothing

  4. #4

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Quote Originally Posted by szlamany View Post
    SELECT TOP 1 needs an ORDER BY, imo. Always needs one...

    Do you have one that would make sense?
    I was writing, and didn't see this.

    I assume you mean for the outer SELECT TOP (1), because the inner one has an ORDER BY.

    In fact, for the outer SELECT TOP (1), that TOP (1) probably doesn't need to be there. The SurveyID is the PK of the STRM_Survey table, so if I am joining on STRM_Survey.SurveyID = A.SurveyID, I believe there can't be more than one. After all, A.SurveyID already has a Top (1) that DOES have an ORDER BY clause, so there can only be one Survey ID in A, and that can only result in one survey in STRM_Survey, since it's the PK of the table.

    However, as I then noted, when I run the query outside of the SP, it produces the right result. I also confirmed that both the inner SELECT TOP (1) and the outer SELECT TOP (1) both do return only one value for the test data. The outer one returns the correct value, too.
    My usual boring signature: Nothing

  5. #5
    PowerPoster
    Join Date
    Nov 2017
    Posts
    3,116

    Re: Puzzling Stored Procedure

    Quote Originally Posted by Shaggy Hiker View Post
    That's a bit ugly, but mostly because the Haversine formula is ugly. It can be simplified down to this, as I'm quite confident that the Haversine implementation is correct:
    I'll admit up front that I had to look up what the Haversine formula is to refresh my memory. Comparing the formula from Wikipedia to your code, it seems, to me, and at first glance, like they are not equivalent, even when accounting for various trig identities and such. I could be wrong, but that's my initial impression.

    https://en.wikipedia.org/wiki/Haversine_formula

  6. #6

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Well, that was always my initial impression, too, and it has been a while since I've tested it with known values, but at one time, it appeared to be returning correct values.

    Still, it doesn't explain why the SP and an ad hoc query with hard coded values for lat and long, are returning different results...for just a few test points.
    My usual boring signature: Nothing

  7. #7
    PowerPoster
    Join Date
    Nov 2017
    Posts
    3,116

    Re: Puzzling Stored Procedure

    Quote Originally Posted by Shaggy Hiker View Post
    Well, that was always my initial impression, too, and it has been a while since I've tested it with known values, but at one time, it appeared to be returning correct values.

    Still, it doesn't explain why the SP and an ad hoc query with hard coded values for lat and long, are returning different results...for just a few test points.
    I would recommend implementing the exact formula outlined in the Wikipedia article, which uses arcsin. The comments at the bottom of the "Formulation" section may be directly relevant to what you are seeing, mentioning that, when tackling spherical distance from the perspective of cosine, for points that are close together, you may see inaccurate results.

  8. #8
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Puzzling Stored Procedure

    Quote Originally Posted by Shaggy Hiker View Post
    I assume you mean for the outer SELECT TOP (1), because the inner one has an ORDER BY.
    I admit that I got lost in that sea of SQL text - wow!

    I would forget the SP method all together and go with a table-valued function. Passing in just the one SURVEYID and return a table with all the rows related to that spot in the river with the scores and ARCSIN values, and what not, all in fields in those rows. Put a ROW_NUMBER() OVER (ORDER BY...) in place to put a "1" at the best row, if that makes sense in the function itself.

    This allows you to be more programmatic and less "set-based-logic'd"

    One of my more painful TVF's included - just to push you over the edge

    Code:
    Create Function dbo.cdStuCourseSection_TVF(@Term varchar(100), @CrsSec varchar(100))
    Returns @cdStuCourseSection Table (TermId int, SCS_COURSE_SECTION varchar(100), SCS_STUDENT varchar(100), FIRST_NAME varchar(100), LAST_NAME varchar(100), SEC_SHORT_TITLE varchar(100), SEC_NAME varchar(100), SEC_MEETING_INFO varchar(1000)
    		, SEC_START_DATE datetime, SEC_END_DATE datetime, STUDENT_CURRENT_TYPE varchar(100)
    		, DualCredit varchar(1), HighSchool varchar(100), NumGradeFinal varchar(100)
    		, STUDENT_COURSE_SEC_ID varchar(100), FACULTY_ID varchar(100))--, SCS_STUDENT_ACAD_CRED varchar(100))
    Begin
    	Declare @ShowHS int
    	Set @ShowHS=0
    	If Right(@CrsSec,1)='*'
    	Begin
    		Set @ShowHS=1
    		Set @CrsSec=Left(@CrsSec,Len(@CrsSec)-1)
    	End
    	Insert into @cdStuCourseSection
    		Select TT.TermId, SCS.SCS_COURSE_SECTION Collate Database_Default			"SCS_COURSE_SECTION"
    			, SCS_STUDENT Collate Database_Default				"SCS_STUDENT"
    			, STU.FIRST_NAME Collate Database_Default			"FIRST_NAME"
    			, STU.LAST_NAME Collate Database_Default			"LAST_NAME"
    			, CS.SEC_SHORT_TITLE Collate Database_Default		"SEC_SHORT_TITLE"
    			, CS.SEC_NAME Collate Database_Default				"SEC_NAME"
    			, SEC_MEETING_INFO Collate Database_Default			"SEC_INFO"
    			, CS.SEC_START_DATE
    			, CS.SEC_END_DATE
    			, S.STUDENT_CURRENT_TYPE Collate Database_Default	"STUDENT_CURRENT_TYPE"
    			, Case When DC_ID IS NOT NULL Then 'Y'
    											Else '' End			"DualCredit"
    			, Cast('' as varchar(100))							"HighSchool"
    			, Cast('' as varchar(100))							"NumGradeFinal"
    			, SCS.STUDENT_COURSE_SEC_ID
    			, CSF_FACULTY Collate Database_Default			"FACULTY_ID"
    			--, SCS.SCS_STUDENT_ACAD_CRED
    			--,STC.*,*
    			From [COLL-DATA-HOST].[coll18_prod].[dbo].[COURSE_SECTIONS] CS  with (NOLOCK)
    			Left Join [COLL-DATA-HOST].[coll18_prod].[dbo].[STUDENT_COURSE_SEC] SCS  with (NOLOCK) ON SCS.SCS_COURSE_SECTION=CS.COURSE_SECTIONS_ID 
    			Left Join [COLL-DATA-HOST].[coll18_prod].[dbo].[STC_STATUSES] STC with (NOLOCK) ON STC.STUDENT_ACAD_CRED_ID=SCS.SCS_STUDENT_ACAD_CRED 
    			Left Join [COLL-DATA-HOST].[coll18_prod].[dbo].[PERSON] STU with (NOLOCK) ON STU.ID=SCS.SCS_STUDENT
    			Left Join [COLL-DATA-HOST].[coll18_prod].[dbo].[N05_STUDENT_TERMS_VIEW] S with (NOLOCK) on S.STTR_STUDENT=STU.ID and S.STTR_TERM=@Term
    			left join [COLL-DATA-HOST].[coll18_prod].[dbo].COURSE_SECTIONS_LS ls1 WITH (NOLOCK) on CS.COURSE_SECTIONS_ID = ls1.COURSE_SECTIONS_ID and ls1.POS = 1
    			left join [COLL-DATA-HOST].[coll18_prod].[dbo].COURSE_SEC_FACULTY csf WITH (NOLOCK) on ls1.SEC_FACULTY = csf.COURSE_SEC_FACULTY_ID and ls1.POS = 1
    			Left Join Terms_T TT on TT.Term=@Term
    			left join [COLL-DATA-HOST].[coll18_prod].[dbo].N05_DUAL_CREDIT WITH (NOLOCK) on DC_ID = STU.ID
    			Where CS.SEC_TERM=@Term and CS.COURSE_SECTIONS_ID=@CrsSec AND STC.POS=1 AND STC.STC_STATUS NOT IN ('X','D','C','W','DR')
    				and ((S.STTR_ACAD_LEVEL='CE' and @Term like '%Q%') or (S.STTR_ACAD_LEVEL<>'CE' and @Term not like '%Q%'))
    
    	If @ShowHS=1
    	Begin
    		Update @cdStuCourseSection Set HighSchool=(Select HIGH_SCHOOL_NAME Collate Database_Default From [COLL-DATA-HOST].[coll18_prod].[dbo].[HIGHSCHOOL_VIEW] 
    													Where PERSON_ID Collate Database_Default = SCS.SCS_STUDENT)	   
    				From @cdStuCourseSection SCS
    				Where SCS.DualCredit='Y'
    		Update @cdStuCourseSection Set NumGradeFinal=SG.NumGradeFinal
    				From @cdStuCourseSection SCS
    				Left Join StaffGrade_T SG on SG.TermId=SCS.TermId and SG.CrsSec=SCS.SCS_COURSE_SECTION and SG.MasId=SCS.SCS_STUDENT
    	End
    
    	Declare @SCSI varchar(100)
    	Declare @FG varchar(100)
    	Declare @FG2 varchar(100)
    	Set @SCSI=(Select Min(STUDENT_COURSE_SEC_ID) From @cdStuCourseSection)
    	While @SCSI is not null
    	Begin
    		Select @FG=SAC.STC_FINAL_GRADE, @FG2=SAC.STC_VERIFIED_GRADE From [COLL-DATA-HOST].[coll18_prod].[dbo].[STUDENT_ACAD_CRED] SAC Where SAC.STC_STUDENT_COURSE_SEC=@SCSI
    		If IsNull(@FG,'')='I' or IsNull(@FG2,'')='I' Delete from @cdStuCourseSection Where STUDENT_COURSE_SEC_ID=@SCSI
    		Set @SCSI=(Select Min(STUDENT_COURSE_SEC_ID) From @cdStuCourseSection Where STUDENT_COURSE_SEC_ID>@SCSI)
    	End
    
    	Return
    End

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  9. #9

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Yeah, it's already pushing me over the edge.

    I thought that the Haversine implementation I used followed ESRI:
    https://community.esri.com/t5/coordi...20two%20points.

    It doesn't appear to, though, so I'm not sure where I got that from.
    My usual boring signature: Nothing

  10. #10

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    I switched that to use the second answer, here:

    https://stackoverflow.com/questions/...nition-for-sql

    which references the Wiki article. It doesn't change anything. Whether it gives the same answer for distance is a bit irrelevant since it doesn't return the right stream.

    You may also note that my original calculation matched the question in that link. However, when comparing the two head to head, my original approach returned 0, while the new approach returned 1.1x10-5, which is so close to zero as makes no difference, at least to me. Therefore, whether the two are all that much different, I can't say. What I can say is that it's beyond my geometric skills to know how close the one matches the other, and that both of them return the wrong value from the SP, but the right value if run as ad hoc queries.
    My usual boring signature: Nothing

  11. #11

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    @szlamany: I don't have the SurveyID, I think, unless I am misunderstanding you. The inner Select in my SP is getting a SurveyID, then using that to find an LLID from a different table. I'm passing in a lat/lon pair, and using that to get a SurveyID.

    Of course, after all this testing, there's an obvious answer: The query works, so long as it isn't in an SP. I could abandon the SP, do the same work in code, and get the right answer. Every test I have run shows that the SQL works correctly when taken out of the SP.

    The problem with that answer should be obvious: How unsettling is it if a query returns different results if it is part of an SP or not?
    My usual boring signature: Nothing

  12. #12

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Curiouser and curiouser.

    After doing a bit of research, I decided to run a pair of ad hoc queries:

    The first one:
    Code:
    DECLARE @Latitude float = 44.01617787
    DECLARE @Longitude float = -114.845692
    
    SELECT TOP(1) SurveyID, 
    			MIN( 2 * 6335 * asin(sqrt(power(sin((radians(@Latitude) - radians(STRM_Survey_Waypoints.Latitude)) / 2), 2)
                + cos(radians(STRM_Survey_Waypoints.Latitude))
                * cos(radians(@Latitude))
                * power(sin((radians(@Longitude) - radians(STRM_Survey_Waypoints.Longitude)) / 2), 2)
            )) ) AS Res FROM STRM_Survey_Waypoints 
    			  WHERE @Latitude IS NOT Null AND @Longitude IS NOT Null  AND STRM_Survey_Waypoints.Latitude BETWEEN @Latitude - (20/111.045) AND @Latitude + (20/111.045) AND STRM_Survey_Waypoints.Longitude BETWEEN @Longitude - (50/(111.045 * COS(RADIANS(@Latitude)))) AND @Longitude + (50/(111.045 * COS(RADIANS(@Latitude))))
    			  GROUP BY SurveyID ORDER BY res
    The second one:
    Code:
    SELECT TOP(1) SurveyID, 
    			MIN( 2 * 6335 * asin(sqrt(power(sin((radians(44.01617787) - radians(STRM_Survey_Waypoints.Latitude)) / 2), 2)
                + cos(radians(STRM_Survey_Waypoints.Latitude))
                * cos(radians(44.01617787))
                * power(sin((radians(-114.845692) - radians(STRM_Survey_Waypoints.Longitude)) / 2), 2)
            )) ) AS Res FROM STRM_Survey_Waypoints 
    			  WHERE 44.01617787 IS NOT Null AND -114.845692 IS NOT Null  AND STRM_Survey_Waypoints.Latitude BETWEEN 44.01617787 - (20/111.045) AND 44.01617787 + (20/111.045) AND STRM_Survey_Waypoints.Longitude BETWEEN -114.845692 - (50/(111.045 * COS(RADIANS(44.01617787)))) AND -114.845692 + (50/(111.045 * COS(RADIANS(44.01617787))))
    			  GROUP BY SurveyID ORDER BY res
    I didn't take any time to format either one of these for the post. However, they are virtually the same. In fact, I would expect them to be almost exactly the same. In the first one, I declared a pair of float variables and used those in the subsequent SELECT query. In the second one, rather than using the float variables, I stuck the floating point literals into the query in place of the variables. In fact, that's ALL I did, which is why I didn't format the two for the display. I didn't want even formatting to alter the fact that ALL I did was exchange variables for literals.

    The two don't have the same results. The one that uses the variables gives me the same incorrect value as the SP, while the one with the literals gives me the correct value.

    So, now it isn't even SP vs Ad Hoc, it's the use of variables vs numeric literals.

    What is SQL Server doing? Was I wrong to use float types considering the literal values I was passing in? Is SQL Server altering the values of variables in some fashion?

    One further point was that I was able to take the arguments to the SP and alter the longitude value bit by bit until it switched to the right answer. In fact, by holding latitude constant (the value shown in the second SQL, in case you are curious) and altering the longitude, I found that:

    -114.845692 the initial value (gives wrong result).
    -114.845831 produces the wrong result.
    -114.845832 produces the right result.

    Looking in Google Maps, both points (with the initial longitude versus the longitude that gives the right result) are in the same stream, and about two to three meters apart.
    My usual boring signature: Nothing

  13. #13

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Just looked at the execution plans for the two queries shown in that last one. They are massively different.

    The one with the literals is straightforward:
    Clustered index scan ->Computer Scalar -> Hash Match -> Sort -> Select

    The one with the variables can't quite be written like that, as it has multiple nested branches.

    A pair of Index Seek each leading to a Filter, which then combine in a Hash Match. That match, along with a Key Lookup, feeds into a Nested Loops - > Compute Scalar -> Sort -> Stream Aggregate -> Sort -> Select.

    I'm pretty sure I've never seen that large a difference between two queries that were otherwise so utterly similar. The use of variables seems to be causing SQL Server to "lose it's mind"

    EDIT: The execution plan did suggest an additional index. Once that was added, the difference between the execution plans was greatly reduced. In fact, it was reduced to the point where they were quite similar, though still not the same.

    In any case, the result was correct. The SP is still returning the wrong result, but it is quite likely that it has cached it's execution plan, and causing it to jettison that would fix the immediate error. I don't have time for that, at the moment, so that will have to wait for tomorrow.
    Last edited by Shaggy Hiker; May 24th, 2023 at 03:58 PM.
    My usual boring signature: Nothing

  14. #14
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Puzzling Stored Procedure

    Could it be that the variables declared force a "data type" to be used during the calc that causes loss of precision?

    In the past when I've needed a more "precise" numeric data type I would go with Decimal(), for doing things like rank-in-class and GPA calcs.

    As for the TVF, could you not run your query for just one SurveryId? Is that not what you want the output organized and grouped by?

    With a TVF, you don't JOIN to it, you CROSS APPLY to it. Basically, you GROUP down to just one row for a SURVEYID - pass that into the TVF, where you then can return a TABLE with ALL the rows along the waypoint with fields for each of the little values you want to see in detail, but are instead hidden in the calculation you are doing.

    Code:
    Select SSW.SurveyId,MSF.*
        From STRM_Survey_Waypoints SSW
        Cross Apply MyStream_TVF(SSW.SurveyId, @Latitude, @Longitude) MSF
        Group by SSW.SurveyId
    Can I have some test data to play with myself? Just enough rows to write the TVF myself and then you can test it yourself on the real data.
    Last edited by szlamany; May 25th, 2023 at 05:29 AM.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  15. #15

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    I appreciate the offer, but it's not worth your time for that. After all, by this point, I know how to solve the problem, but am mostly freaked out that it IS a problem. There are basically two things that I expect a database to do, the first is to store my data, the second is to allow me to retrieve my data. What I'm seeing here is that I get occasionally incorrect answers depending on whether I use variables or not. It's kind of like being used to adding two and two and usually getting four, but occasionally getting a rutabaga: If it's going wrong here, then where else will it go wrong, and how?

    Basically, I'm not so much focused on making the problem go away as understanding what caused the problem in the first place. The solution would be easy to implement (don't use a stored procedure), but that just can't be THE solution, because it's not reasonable to expect a SQL query to behave so differently without much cause.

    I was thinking that the data type might be an issue. After all, I was able to hold the latitude constant and change the longitude just enough that I got a bad result. It was a very small change, though. A change small enough that the precision difference between Decimal and Float might come into play. However, I think that's probably not the case for a variety of reasons, such as the literal was probably a float, as were the variables.

    While playing with it, though, I stumbled across a couple other intriguing points that I need to pursue more vigorously. After all, the query is only looking for the known waypoint that is closest to the one in question. This is in an area where we've been taking dozens of waypoints each year for a couple decades. So long as the point in question is on the stream, there is likely to be a waypoint very close....but the query then assumes that the waypoint in the database has been placed on the right stream. A slight imprecision in the data types, when coupled with an incorrect stream for some existing waypoints...is a possibility I need to look into.
    My usual boring signature: Nothing

  16. #16

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: Puzzling Stored Procedure

    Well, the immediate problem has been solved. There's still something worth understanding as to why a variable was different from a literal, but it probably comes down to a data type issue.

    I went looking at the survey on the wrong stream that was being returned. It had three good waypoints, and one bad waypoint. That bad waypoint was, by pure coincidence, almost exactly on top of the test waypoint. So, I managed to make up test data that, by coincidence, managed to land very close to an error in the underlying database. This stuff gets checked over repeatedly by numerous people, and nobody caught that one errant waypoint. Better yet, since there are many thousand waypoints on that stream, with numerous waypoints right in that area, the bad waypoint was within a meter of a couple good waypoints, so ANY shift, no matter how small, might happen to mean that I was closer to the bad one than to any of the good ones. When I fuzzed my data to create some random test points, I just happened to fuzz two of them such that they were closer to the bad point, while fuzzing a third one so that it was closer to a good point.

    Still, I learned from this that literals and variables are handled quite differently in SQL Server. The difference can result in radically different execution plans that can strongly influence performance. There's also some difference in data types, or rounding, or something like that, but I have yet to determine what that is. The literal and the variable were not the same, but only in the fifth or sixth decimal point. By chance, that was just enough.
    My usual boring signature: Nothing

  17. #17
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: [RESOLVED] Puzzling Stored Procedure

    That was the whole reason for my suggestion to use a TVF. They can return all the ROWS of a working resultset, kind of like seeing what is going on in that TOP 1 or GROUP BY output.

    The datatype of variables and fields in an equations - based on however the "position" of them pans out in the parsing of the formula, will drive the data typing used during the calc. If you could determine the "best" return precisions that those SIN() functions are returning, and match your VARIABLES to that - you will get cleaner results. Decimal(x,y) is going to be way more accurate than FLOAT.

    This link infers that RADIANS likes to use DECIMAL(38,0)

    https://learn.microsoft.com/en-us/sq...l-server-ver16

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  18. #18

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,989

    Re: [RESOLVED] Puzzling Stored Procedure

    I've been in and out of meetings, but along the way, I've run into yet another puzzle about this. The errant waypoint doesn't appear to be always an errant waypoint. Off to another meeting, though, so I can't study it any further. Basically, at a glance, the latitude appears to be further north than it should be, but looking at the other waypoints from the survey...that one doesn't stand out. On a map, it is the furthest north, but it doesn't have the lowest latitude, so it is not the furthest north.

    That will probably become clear once I get a few minutes to look at it a bit more.
    My usual boring signature: Nothing

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width