Results 1 to 7 of 7

Thread: Recursing both ways along a chain - SQL Server 2008 R2

  1. #1

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,902

    Recursing both ways along a chain - SQL Server 2008 R2

    This feels like it should be easy but I just can't quite nail it.

    I'm working on a system that logs "Persons".
    One Person can be "Merged" into another to indicate that they're actually the same person.
    In order to support Unmerging, rather than physically merging the two records and repointing all their child records in other tables, instead we hold a "Merge" record which says e.g. Person A has Been Merged into Person B. From an end users point of view Person A then ceases to exist (only people who haven't been merged into someone else appear in the UI) although, in data terms, Person A's record does still exist and has all it's children attached to it - it's just that the whole caboodle will henceforth be considered part of Person B.

    A couple of extra rules:-
    A person can only ever be merged into 1 other person.
    That person could then be merged into a third and so on. Forming a "Chain"
    Many persons can be merged into 1 person.

    Hope that all makes sense

    So here's some data that would produce a single "merged" person (ID 6) out of several original persons:-
    Code:
    Insert into MergedPersons
    (MergedPersonID, MergedIntoPersoniD)
    Values (2,3),
    (3,4),
    (4,5),
    (5,6),
    (7,6),
    (8,7),
    (9,7);
    I'm now looking for an efficient way, given a single merge pairing, to get every person id in the chain. My first thought was simply to use a recursive cte. However, because I might start with a pairing in the middle of a chain I need to recurse in both directions along it. I therefore wrote this:-
    Code:
    with cteMergeChain as
    (
          Select [Merged Person ID], [Merged Into Person ID]
          From People.[Merged Persons]
          Where [Merged Person ID] in (@PersonID1, @PersonID2)
          and [Merged Into Person ID] in (@PersonID1, @PersonID2)
          
          Union All
          
          (Select MP.[Merged Person ID], MP.[Merged Into Person ID]
          From cteMergeChain MC
          Join People.[Merged Persons] MP
               on MC.[Merged Person ID] = MP.[Merged Into Person ID]
               or MC.[Merged Into Person ID] = MP.[Merged Person ID])
    )
    Select [Merged Person ID]
    From cteMergeChain
    
    Union
    
    Select [Merged Into Person ID]
    From cteMergeChain
    ...but that creates an infinite recursion.

    Say I start with pairing (4,5). My first iteration will get me additional pairings:-
    (3,4) and (5,6). OK so far.
    But the next iteration, as well as finding (2,3) and (7,6), will return me (4,5) again. Thus begins an infinite recursion.

    My first thought was simply to add an Except to the recursing part of the cte:-
    Code:
    with cteMergeChain as
    (
          Select [Merged Person ID], [Merged Into Person ID]
          From People.[Merged Persons]
          Where [Merged Person ID] in (@PersonID1, @PersonID2)
          and [Merged Into Person ID] in (@PersonID1, @PersonID2)
          
          Union All
          
          (Select MP.[Merged Person ID], MP.[Merged Into Person ID]
          From cteMergeChain MC
          Join People.[Merged Persons] MP
               on MC.[Merged Person ID] = MP.[Merged Into Person ID]
               or MC.[Merged Into Person ID] = MP.[Merged Person ID]
          Except 
          Select [Merged Person ID], [Merged Into Person ID]
          From cteMergeChain)
    )
    Select [Merged Person ID]
    From cteMergeChain
    
    Union
    
    Select [Merged Into Person ID]
    From cteMergeChain
    However that's not allowed because "Recursive member of a common table expression 'cteMergeChain' has multiple recursive references". Which is SQL Server speak for "you can't refer back to the cte a second time".

    At the moment I've fallen back on recursing all the way along to the end of the chain in one direction, getting the ID at the end of the chain and then using that as the start point to recurse all the way back to the other end, collecting all the IDs as I go. This seems a bit wasteful though.

    I've cut this a few ways but I just can't seem to come up with a solution that recurses in both directions at the same time without causing an infinite recursion.

    Anyone got any suggestions?
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  2. #2
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,537

    Re: Recursing both ways along a chain - SQL Server 2008 R2

    I've got an idea or two... but I'm getting ready for my annual review in a couple min here... I've done something similar... I'll see if I can remember just how I did it... but I think it involved a two-step process of walking the chain back up to the top to get the root of all things.. then back down again to get all the children along the way...

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  3. #3
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,537

    Re: Recursing both ways along a chain - SQL Server 2008 R2

    See if this is any help...
    it uses a recursive CTE to walk the chain from anywhere back up to the top... then uses that top as an anchor for another CTE that then walks it back down....
    Code:
    create Table MergedPersons (MergedPersonID int, MergedIntoPersonID int)
    go
    Insert into MergedPersons
    (MergedPersonID, MergedIntoPersoniD)
    Values (2,3),
    (3,4),
    (4,5),
    (5,6),
    (7,6),
    (8,7),
    (9,7);
    go
    declare @P1ID int, @P2ID int
    set @P1ID = 4; set @P2ID = 5
    ;with MergeChain as (
      select MergedPersonID, MergedIntoPersonID, 1 as Sequence
      from MergedPersons 
      where MergedPersonID = @P1ID and MergedIntoPersonID = @P2ID 
    
      union all
    
      select MP.MergedPersonID, MP.MergedIntoPersonID, MC.Sequence + 1 as Sequence
      from MergedPersons MP
      inner join MergeChain MC 
        on MP.MergedIntoPersonID = MC.MergedPersonID
    ), MergeChainTop as (
      select 
        top 1 
        MergedPersonID, MergedIntoPersonID
      from MergeChain
      order by Sequence desc
    ), MergeChain2 as (
      select MP.MergedPersonID, MP.MergedIntoPersonID, 1 as Sequence
      from MergeChainTop MCT
      inner join MergedPersons MP on MCT.MergedPersonID = MP.MergedPersonID and MCT.MergedIntoPersonID = MP.MergedIntoPersonID
    
      union all
    
      select MP.MergedPersonID, MP.MergedIntoPersonID, MC.Sequence + 1 as Sequence
      from MergedPersons MP
      inner join MergeChain2 MC 
        on MP.MergedPersonID = MC.MergedIntoPersonID
    )
    select * from MergeChain2
    go
    
    
    drop table MergedPersons
    go
    MergeChain CTE starts in the middle and returns all parent records... MergeChainTop CTE then orders them in reverse (based on the Sequence number that's generated) and returns the first (last) record, which is the top record. That is then used in the inner join as the anchor in the MergeChain2 CTE to walk the line back down.
    Results:
    Code:
    MergedPersonID         MergedIntoPersonID    Sequence
    2                               3                               1
    3                               4                               2
    4                               5                               3
    5                               6                               4
    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  4. #4

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,902

    Re: Recursing both ways along a chain - SQL Server 2008 R2

    I think it involved a two-step process of walking the chain back up to the top to get the root of all things.. then back down again to get all the children along the way...
    Yeah, that's what I'm doing at the moment. It seems kinda daft to walk back and forward across the same ground though. It feels like there should be a better way but I just can't seem to find it - maybe there isn't one.

    I got some mileage out of introducing a "direction of travel" type field so the links found in the forward direction would only look for further links in the forward direction and links in the backward direction would only look for further links in the backward direction. That would prevent it re-finding the start point. Turned out to be a dead end, though, because of this : "Many persons can be merged into 1 person". That means that when I'm walking forward I still need to look back, just not at the particular link I just walked forward from. Maybe I could have fields to hold the whole previous link... I might give that a try tomorrow.

    My gut tells me there's something in the "except" approach. After all, that's the closest analogy for what I'm trying to do on each pass: "find me any link, in either direct, which you don't already know about". I just can't seem to find a way of forcing it into the syntax though.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  5. #5
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,537

    Re: Recursing both ways along a chain - SQL Server 2008 R2

    The only other thing I can think of still involves two CTE's one that travels in one direction, and one that goes in the other, and then union them together... you'd have to make sure your anchor doesn't get counted twice, but that's not too difficult.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  6. #6

    Thread Starter
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,902

    Re: Recursing both ways along a chain - SQL Server 2008 R2

    Woot! Finally managed it. The trick was to "remember" the last person ID on each iteration and make sure I didn't re-add it. To do that I had to separate the forward and back recursion elements into two different blocks. Here's the final result:-

    SQL Code:
    1. Declare @PersonID1 bigint = 5
    2. Declare @PersonID2 bigint = 4;
    3.  
    4. with cteMergeChain as
    5. (
    6.       Select [Merged Into Person ID] PersonID, Cast(null as BigInt) as LastPersonID, 1 as Iter, cast('Start' as NVarChar(20)) as Direction
    7.       From People.[Merged Persons]
    8.       Where [Merged Person ID] = @PersonID1 and [Merged Into Person ID] = @PersonID2
    9.       Or [Merged Into Person ID] = @PersonID1 and [Merged Person ID] = @PersonID2
    10.      
    11.       Union All
    12.      
    13.       Select MP.[Merged Person ID], MC.PersonID, Iter + 1, cast('Merged Into' as NVarChar(20))
    14.       From cteMergeChain MC
    15.       Join People.[Merged Persons] MP
    16.            on MC.PersonID = MP.[Merged Into Person ID]
    17.            and not (isnull(MC.LastPersonID, 0) = MP.[Merged Person ID])
    18.            
    19.       Union All
    20.      
    21.       Select MP.[Merged Into Person ID], MC.PersonID, Iter + 1, cast('Merged From' as NVarChar(20))
    22.       From cteMergeChain MC
    23.       Join People.[Merged Persons] MP
    24.            on MC.PersonID = MP.[Merged Person ID]
    25.            and not (isnull(MC.LastPersonID, 0) = MP.[Merged Into Person ID])
    26. )
    27.  
    28. Select *
    29. From cteMergeChain

    The Iter and Direction fields aren't really needed, they're just there as part of my investigation/debugging process.

    Thanks as always for you input, TG.



    Edit> A trivial improvement is to use both the Merged From and Merged To in the root. Won't make alot of difference but could potentially save a single iteration:-
    SQL Code:
    1. Declare @PersonID1 bigint = 5
    2. Declare @PersonID2 bigint = 4;
    3.  
    4. with cteMergeChain as
    5. (
    6.       (Select [Merged Into Person ID] PersonID, [Merged Person ID] LastPersonID, 1 as Iter, cast('Start' as NVarChar(20)) as Direction
    7.       From People.[Merged Persons]
    8.       Where [Merged Person ID] = @PersonID1 and [Merged Into Person ID] = @PersonID2
    9.       Or [Merged Into Person ID] = @PersonID1 and [Merged Person ID] = @PersonID2
    10.       union
    11.       Select [Merged Person ID] PersonID, [Merged Into Person ID] LastPersonID, 1 as Iter, cast('Start' as NVarChar(20)) as Direction
    12.       From People.[Merged Persons]
    13.       Where [Merged Person ID] = @PersonID1 and [Merged Into Person ID] = @PersonID2
    14.       Or [Merged Into Person ID] = @PersonID1 and [Merged Person ID] = @PersonID2)
    15.      
    16.      
    17.       Union All
    18.      
    19.       Select MP.[Merged Person ID], MC.PersonID, Iter + 1, cast('Merged Into' as NVarChar(20))
    20.       From cteMergeChain MC
    21.       Join People.[Merged Persons] MP
    22.            on MC.PersonID = MP.[Merged Into Person ID]
    23.            and not (isnull(MC.LastPersonID, 0) = MP.[Merged Person ID])
    24.            
    25.       Union All
    26.      
    27.       Select MP.[Merged Into Person ID], MC.PersonID, Iter + 1, cast('Merged From' as NVarChar(20))
    28.       From cteMergeChain MC
    29.       Join People.[Merged Persons] MP
    30.            on MC.PersonID = MP.[Merged Person ID]
    31.            and not (isnull(MC.LastPersonID, 0) = MP.[Merged Into Person ID])
    32. )
    33.  
    34. Select *
    35. From cteMergeChain
    Last edited by FunkyDexter; Feb 1st, 2018 at 06:30 AM.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  7. #7
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,537

    Re: Recursing both ways along a chain - SQL Server 2008 R2

    Sweet! I'll have to remember this thread. We've got a similar setup for this same type scenario, merging of records. We've also got quite a few tables that also make use of the HIERARCHYID type in SQL Server that I find myself needing to traverse. Something like this might come in handy.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width