Page 1 of 2 12 LastLast
Results 1 to 40 of 48

Thread: NEWBY Struggling with Conditional Text Merge Problem

  1. #1

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    NEWBY Struggling with Conditional Text Merge Problem

    I hope someone can take some time out of their busy schedules to point me in the right direction as to how I should tackle this project; my immature programming head is just not grasping it at present, maybe I am over thinking it!

    I basically want to construct a condition merge routine for a text readable file. One field contains a date to which I would like to assign a “quality” factor, another field containing Place detail I would use the number of comma delimiters as an indication of quality, more being more components and therefore more detailed. An example portion of the original file structure is below, this is a gedcom file (genealogy information)

    Code:
    0 @I1@ INDI ‘ start of information for individual#1
    ..
    1 BIRT
    2 DATE 1868 ‘ third highest quality score date
    2 PLAC New York, America
    ..
    1 BIRT 
    2 DATE Mar 1868 ‘ second highest quality score date
    1 BIRT
    2 DATE 18 Mar 1868 ‘ highest quality score date
    2 PLAC America
    ..
    1 BIRT
    2 DATE abt 1868 ‘ fourth highest quality score date
    2 PLAC Sullivan County, NY, United States
    ..
    0 @I2@ INDI ‘ start of information for individual#2
    Constants are :-
    “0 @I” starts each new person’s information block.
    “1” starts each new event within the “0 @I” tags, in this case “1 BIRT” (birth)

    The number of lines appearing between the “0 @I” tags and the “1 “ tags are totally variable. The “2 DATE” and “2 PLAC” may or may not appear.

    Code:
    1 BIRT
    2 DATE 18 Mar 1868 ‘most accurate date and same year as others
    2 PLAC Sullivan County, NY, United States
    What I would like to work towards in this case is to reconstruct the information to one Birth Event with the most accurate information like the example above and discard the additional lesser quality information. Unfortunately at the moment I don’t have the experience to jump towards the right solution.

    My thoughts were to read between the “0 @I” tags into fileINPUT array, reconstruct the information in fileOUTPUT array and write back to output file and also write changed information to a log file for auditing.

    This is maybe far too ambitious for me at my stage of learning but no pain no gain, I believe once I get some experienced direction on how I should be tackling this problem and maybe a short example then I will start to achieve some real results.

    In hope of your indulgence and a bit of patience from a family historian. (and newby programmer)

  2. #2
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    I would approach it like this:

    1) Read in the entire contents of the file to a single string variable

    2) Use the Split(<string>, vbNewLine) function to create an array of lines from the string variable

    3) Iterate the array, one line at a time, parsing out the data as desired

  3. #3

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Thanks, I should have given some idea of the size of the file, 69Mb, individual wise there are over 100K, thats why I was goint to attack it chunk by chunk.

    I must look more into the use of the SPLIT JOIN functions though.

  4. #4
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: NEWBY Struggling with Conditional Text Merge Problem

    you can also split the file on "0 @I" so you would have an array with each person as an element

    to score you can use like
    vb Code:
    1. '  for dates
    2. score = UBound(Split(Mid(mystr, 8)))
    3. ' or for places
    4. score = UBound(Split(Mid(mystr, 8), ",")) + 1
    where mystr is a date or place line, you would need to test if a delimiter exists to avoid error

    something like
    vb Code:
    1. If InStr(mystr, "PLAC") > 0 Then
    2.   If InStr(mystr, ",") Then
    3.     score = UBound(Split(Mid(mystr, 8), ",")) + 1
    4.     ElseIf Len(mystr) < 9 Then score = 0
    5.     Else: score = 1
    6.   End If
    7. End If
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  5. #5
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    I was initially puzzled by why BIRT was always "blank", but
    I think I get it now .. the following is a repeat of your example,
    indents added for illustrative purposes.
    Code:
    1    0 @I1@ INDI ‘ start of information for individual#1
    2      ..
    3      1 BIRT
    4         2 DATE 1868 ‘ third highest quality score date
    5         2 PLAC New York, America
    6      ..
    7      1 BIRT 
    8         2 DATE Mar 1868 ‘ second highest quality score date
    9      1 BIRT
    10        2 DATE 18 Mar 1868 ‘ highest quality score date
    11        2 PLAC America
    12     ..
    13     1 BIRT
    14        2 DATE abt 1868 ‘ fourth highest quality score date
    15        2 PLAC Sullivan County, NY, United States
    16     ..
    17   0 @I2@ INDI ‘ start of information for individual#2
    It kinda makes more sense, visually, to me now.
    1. Raw source data
      • Each "1 " tag represents a different different source regarding BIRT data.
      • Most have a DATE, only some have a PLAC
      • The ".." seems to represent where other "1 " tags could appear, possibly such as DEAT. Thus, for now, ignore.
    2. Output goal: for a given individual, "pick" best DATE and PLAC info
      • DATE and PLAC do not necessarily need to be from same source
      • DATE quality algo
        • 1 .. 18 Mar 1868 --- full dd/mmm/yyyy
        • 2 .. Mar 1868 --- mmm/yyyy
        • 3 .. 1868 --- yyyy
        • 4 .. abt 1868 --- qualified yyyy
      • PLAC quality algo - count commas
        • 1 .. [2] Sullivan County, NY, United States
        • 2 .. [1] New York, America
        • 3 .. [0] America
        • 4 .. [?] <empty>

    OK.. now, to cogitate for a bit.
    Feel free to jump in and correct me where I'm wrong.

    Spoo

  6. #6

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    [*]Each "1 " tag represents a different different source regarding BIRT data.
    CORRECT
    [*]Most have a DATE, only some have a PLAC
    Most have DATE and PLAC but not always.
    [*]The ".." seems to represent where other "1 " tags could appear, possibly such as DEAT. Thus, for now, ignore.
    Sorry for the confusion, the ".." is my separator, actually many more lines of "2 " and even "3 " might appear here. Each Birth, Death, Marriage, Occupation EVENT will always start with "1 ", and will always be immediately followed by DATE and then PLAC when they hold data, if the data does not exist then "2 DATE" or "2 PLAC" or both may not appear.
    [*]Output goal: for a given individual, "pick" best DATE and PLAC info
    CORRECT
    [*]DATE and PLAC do not necessarily need to be from same source
    CORRECT, goal would be to create one EVENT with best quality information from multiple EVENTS if they exist.
    1 BIRT
    2 DATE best quality date
    3 PLAC best quality place

  7. #7
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    In addition to what has already been suggested,
    here is an alternative. It is a rather "brute force"
    approach that uses 3 arrays.

    It will do the:
    • parsing -- aaRAW()
    • segmenting -- aaDATE() and aaPLAC (eg, keep all dates for INDI 1 in "1 place"
    • it does not tackle the "rating" step yet ... baby steps ..

    Code:
    ' 1. dump txt file into array
    Dim aaRaw()
    Dim aaDATE(), aaPLAC()
    '
    size1 = 13      ' possibly set to 1,000,000 - for aaRAW()
    size2 = 4       ' possibly set to 100       - for aaDATE and aaPLAC 2nd dimension
    fn = "D:\VBForum Stuff\Txt Files\lando geneo.txt"
    Open fn For Input As #1
    ee = 0
    nid = 0
    ReDim aaRaw(size1)
    Do While Not EOF(1)
        Line Input #1, aaRaw(ee)
        nid = nid + IIf(Left(aaRaw(ee), 1) = "0", 1, 0) ' count INDI's, for aaDATE and aaPLAC 1st dimension
        ee = ee + 1
    Loop
    Close #1
    ' 2. segment by individual
    ReDim aaDATE(nid - 1, size2), aaPLAC(nid - 1, size2)
    nn = -1
    For ii = 0 To ee - 1
        ' incr INDI
        If Left(aaRaw(ii), 1) = "0" Then
            nn = nn + 1
            dct = 0
            pct = 0
        End If
        zz = Left(aaRaw(ii), 6)
        ' populate aaDATE for this INDI
        If zz = "2 DATE" Then
            aaDATE(nn, dct) = Mid(aaRaw(ii), 8)
            dct = dct + 1
        ' populate aaPLAC for this INDI
        ElseIf zz = "2 PLAC" Then
            aaPLAC(nn, pct) = Mid(aaRaw(ii), 8)
            pct = pct + 1
        End If
    Next ii
    I hope this gives you some ideas. Holler if you have
    any questions.

    EDIT:
    Just saw your post .. you beat me by 2 minutes !
    I think we're on the same page. More later.
    Time for Letterman,


    Attached image shows:
    1. aaRAW()
    2. aaDATE() - aaDATE(0) contains all dates for INDI 1
    3. aaPLAC() - aaPLAC(0) contains all places for INDI 1

    Spoo

    .
    Attached Images Attached Images  
    Last edited by Spoo; Jul 2nd, 2010 at 10:41 PM. Reason: just saw your post

  8. #8

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Thanks, time for bed here, just coming off night shift...

    I have attached an example gedcom with only one individual with various duplications, this is what I am trying to erradicate and I'm sure you will get the idea of how the tags are.

    Each EVENT runs from "1 " tag to next "1 " tag e.g BIRT is the identifier. I am primarily only interested in the 3 or 4 lines after the EVENT start (BIRT) in this example.

    Thx and night night zzzzzzzzzzzzz
    Attached Files Attached Files

  9. #9
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    I tried to look at your attached file, but nothing happens.

    That is, it turns into a .GED file, but when I try to move
    it from zip folder to another folder (by dragging), nothing
    appears. Perhaps you could make a copy of it and rename
    it using a .TXT extension.

    Spoo

  10. #10

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Morning, I have just recreated another Gedcom file as txt and attached it, I have split it to help explanation and annotated it.

    This sort of thing might be a piece of cake to you but although I could fix it with copy&paste (although it would take a while to work through 100k records), the programming is a bit beyond me at present.

    So far I have been able to sequentially read the input file, look for example "2 PLAC" modify and output again, then I moved to the first 3 lines of any event but this is a quamtum leap for me.

    I am sure when I get the fundamentals of how I should be approaching this I can work to understand it and tailor things to my own needs.

    Thanks for your time and consideration, I really do appreciate it.
    Attached Files Attached Files

  11. #11
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    OK, thanks, that one I can read just fine.

    But first, some questions:
    1. Where do you stand regarding your OP (original post #1)?
      • still have questions
      • solved
      • solved but now going to the "next" level per attached file
    2. Which approach(es) did you adopt?
      • Ellis Dee's -- post #2
      • Westconn's -- post #4
      • mine -- post #7
    3. Did you manage to read the entire 69 Meg file using any of the approaches?

    Spoo
    Last edited by Spoo; Jul 3rd, 2010 at 12:56 PM. Reason: add question 3

  12. #12

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    As you can see now my original post was very truncated information but still did contain the important fields which start any new event.

    I have not adopted any approaches yet as regards this problem. Ellis Dee's approach would have been fine on the example I gave but to me this is a much bigger task, may not, just ignorance on my part!

    Westconn1's approach was what my immature programming mind was originally thinking, read each person between the "0 @" tags into an array, that still leaves me the problem of how to reconcile the info which would not fall within me experience to date.

    Yours is looking good although I would never have thought in array dimensions of this size (ignorance again eh!) but another thing I have learnt.

    The small routines I have been creating were working on only 1 or maybe a set of 3 consecutive lines, example some PLAC lines contains “Belfast vol 1 page 123 Sep Qtr” and the Date would contain year only. In this case I would look for PLAC containing Sep Qtr with Instr and amend the year from "2 DATE 1888" to "2 DATE SEP Q 1888" making it more accurate and this is a valid date format as recognized by my genealogy program.

    I'm afraid that is my level at present although I am learning all the time from reading this forum and the contributions from guys like you. I don' expect to ever be an expert programmer but I know there is a lot more I can learn.

  13. #13
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    ... although I would never have thought in array dimensions of this size ...
    Haha .. If you are referring to my 1,000,000 element
    array, well, I did say "brute force" method, didn't I?

    Let's just take a moment to explore that.
    I'd say that there are 4 basic approaches:
    1. undersized -- will get Run-time error '9' - Subscript out of range if array is too small
    2. massively oversize -- to avoid such a run-time error (my 1,000,000 idea)
    3. use ReDim Preserve each time -- efficient (RAM-wise), but slows things down
    4. use ReDim Preserve at intervals (say 5000) -- efficient and fast

    ReDim (without the Preserve keyword) wipes out any existing
    contents of the array.
    • If you are doing this at the outset, then no problems
    • If you are doing this when the array does contain data, big problem; hence the use of Preserve.

    There is one caveate regarding ReDim Preserve: you can only
    increase the upper bound of the last dimension


    So, if you have a 1-D array (such as proposed aaRaw) -- no issues
    But, if you have a 2-D array (such as proposed aaDATE) -- potential issue !!

    We can explore the "potential issue" thingy further, if you are interested.

    Spoo

  14. #14

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    You see I read a lot of stuff here where folks are optimizing for speed, for me it’s not really an issue as this will be an infrequently run routine. All I want to achieve is a routine I can understand, tweak and work with.

    I had read about ReDim and Redim Preserve but have not thought about 2-D arrays yet and did not understand the potential issues.

    For this type of project do your believe the 2-D array approach is the best, I suppose since this is a very infrequent run, I could read through the file on a pre run and collect some data to help dim the arrays?

  15. #15

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Spoo

    I am in work at present and still playing catchup, I read your post#7 again with a clearer head and it is very interesting although I suppose we now have the extra element of various EVENT types also so another array?.

    I was not familiar with the IIF operator (ignorance again, who said it was Bliss?) but now I can see how it works.

    Obviously being in work I can't be trying any of this but I will do when I go off shift tomorrow and get some sleep.

    BTW, I am on the right coast also, just not your right coast

  16. #16
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    This is the type of thing that screams for a UDT array instead of a two-dimensional array.

    How many different kinds of data types are there?

    (And no, you don't want to read the whole 69mb file in at once as I originally suggested. After a few megs you're better off reading sequentially.)

  17. #17

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Ellis Dee, you will see an example of the file structure attached to post#10.

    For someone at my stage in VB it's melting my head, although not as much as it was yesterday, so that's good.

  18. #18
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Quote Originally Posted by Lando View Post
    You see I read a lot of stuff here where folks are optimizing for speed, for me it’s not really an issue as this will be an infrequently run routine. All I want to achieve is a routine I can understand, tweak and work with.

    I had read about ReDim and Redim Preserve but have not thought about 2-D arrays yet and did not understand the potential issues.

    For this type of project do your believe the 2-D array approach is the best, I suppose since this is a very infrequent run, I could read through the file on a pre run and collect some data to help dim the arrays?
    Point taken about infrequently run app.

    As for is "the 2-D array approach best" .. if you were thinking of using
    my approach, keep these points in mind:
    1. aaRaw() -- the huge array -- it is proposed to be only 1-D.
    2. aaDATE() and aaPLAC() -- these were proposed to be 2-D
      • my "way" of thinking about arrays is to think of an Excel spreadsheet (as in the R1C1 reference style)
      • this is definitely not the ONLY way .. flipping things 90 degrees has its strong points
      • anyway, using the R1C1 approach, in the attached image in my post #7, I proposed:
        • 1st dim is rows -- one per INDI (ie, per individual)
        • 2nd dim is cols -- one per source
      • if you do a complete "read" of the 69 Meg data file per my step 1, you will know number of IND's
        • hence you can specify 'number of INDIs' as 1st dim, and it won't change
        • 2nd dim can be ReDim Preserved as needed, with no harmful effects.


    Hope that isn't going overboard !

    Spoo

  19. #19
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    Reading Ellis's post #16 and your reply in #17, it appears that
    you are now going down a different route, seeing as he and I
    are proposing differing approaches.

    I'll back off, therefore, to avoid confusion.
    Holler if you want me to continue.

    EDIT:
    FWIW, I've used the "brute force" approach on 90 Meg CSV files
    consisting of 2,000,000 lines of data. Piece of cake. Takes around
    20 seconds to go through the entire file.

    Spoo
    Last edited by Spoo; Jul 3rd, 2010 at 04:26 PM. Reason: dealing with large files

  20. #20
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: NEWBY Struggling with Conditional Text Merge Problem

    This is one horrendous, nasty file format for something meant to be for machine to machine data exchange. Worse yet they appear to often be encoded using some funky proprietary character encoding instead of ASCII or Unicode from what I've read.

    You might try searching for one of several free format converter utilities out there. You'll be far ahead after first converting this into a Jet MDB or some other database format. Avoid XML or the darned thing will just get enormous on you.

  21. #21
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Quote Originally Posted by Lando View Post
    Ellis Dee, you will see an example of the file structure attached to post#10.
    Yes, downloading that is what prompted my question. I see a bunch of data types in that sample data snippet. I was wondering if there were an exhaustive list of data types?

    dilettante's idea to look for a file converter (preferably to Access) is solid. That could help simplify this process by a fair amount.

    Apart from an off-the-shelf converter, a raw data converter isn't particularly hard to write from scratch. It's the next step of massaging that data where it gets tricky.

  22. #22

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Spoo, please come back I did not intend to change any direction, I thought maybe with the different elements that the UDT array was logical.

    dilettante, this is the gedcom standard between genealogy programs and not what is used day to day, just an easier format for me to work with.

    Ellis Dee, I knew massaging the data was going to be tricky but once I get some kind of basis to work on that I understand I can work with the logic myself.

    All, thanks for taking time to work with me, I have attached another txt file which maybe helps explain better what I am trying to achieve for, in this instance, the Birth Event.
    Attached Files Attached Files

  23. #23
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    LOL .. I'm back !!

    Not being one who has ever worked with UDTs, I stand
    to learn something about them along with you !

    Spoo

  24. #24
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    In that case, here's a quick primer on UDTs. I'll use generic data not specific to this project because I still don't see how it all fits together yet.

    Think of a UDT array like a database table or a spreadsheet. Each element in the array acts as one row. Each piece of that row can be typed differently and strongly, unlike a with a two-dimensional array where either every piece of data must be the same type or every piece of data must be a Variant. Due to their nature, udt arrays are self-documenting, unlike arrays where everything is an index and nothing makes sense without careful inspection.

    That's the basics, and even that alone is quite powerful. It gets better, though. Much better. But here's an example of how a simpe udt array might hold, say, employee info.

    First you put the udt definition top of a module like you would constants. These are often public definitions, and if so they must go in a bas module. They can, however, be private inside forms or classes. In that case, the form or class must encapsulate them completely. In other words, no other module/form/class can directly reference the udt definition in any way. (You can't pass one as a parameter to a function in the class or form.)
    Code:
    Option Explicit
    
    Public Type EmployeeType
        EmployeeID As Long
        Name As String
        Title As String
        Phone As String
        Email As String
    End Type
    Once you have your definition, you can now declare variables (or arrays, usually) of that type. The individual elements in the udt act just like properties.
    Code:
    Dim typEmployee As EmployeeType
    
    typEmployee.EmployeeID = 1234
    typEmployee.Name = "John Smith"
    typEmployee.Title = "Drone"
    The With...End With operator can be quite useful, but avoid prematurely leaving a With...End With block using GoTo, Exit Do, or Exit For.

    Defining a udt as a dynamic array lets you turn it into an honest-to-god table, right in memory.
    Code:
    Dim typEmplyee() As EmployeeType
    
    ReDim Preserve typEmployee(100)
    For i = 0 To 100
        With typEmployee(i)
            Debug.Print .EmployeeID & ": " & .Name
        End With
    Next
    Note that udts free you from having to worry about reserved words like Name. And as a general rule I never use prefixes in udt elements, just like I don't use prefixes when I name fields in databases.

    Now we have the means to create an entire relational database, with a separate udt array for each table and each table having an ID field we can search on. Sorting must be done manually; I usually use Combsort (link in signature) for sorting udt arrays because that algorithm is efficient and easy to adapt.

    More in next post.

  25. #25
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    The most powerful aspect of udts is that they can contain other udts, including dynamic udt arrays. Consider the following:
    Code:
    Option Explicit
    
    Public Type EventType
        Type As String
        Description As String 
        Priority As Long
    End Type
    
    Public Type PersonType
        Name() As EventType
        Birth() As EventType
        Death() As EventType
    End Type
    
    Private mtypPeople() As PersonType
    This rudimentary (but flexible!) structure would allow you to add people to the People array, and each individual person would get their own redim-able Name event array, Birth event array, and Death event array. Each of those event arrays can hold any number of different types of events. For example, the Birth array can hold places, dates, whatever. And each of those can be assigned their own priority, allowing you to massage however you'd like.

    The most powerful aspect of udts is that if you have a non-array udt, you can save it to a file on disk with a single line of code, and read it back from disk into memory with another single line of code. Because udts can contain other udts, what I do whenever I want to persist udts is dump them all into a "main" udt called Data. Then saving/loading all the memory at once is a simple matter of saving/loading Data. For example:
    Code:
    Option Explicit
    
    Public Type EventType
        Type As String
        Description As String 
        Priority As String
    End Type
    
    Public Type PersonType
        Name() As EventType
        Birth() As EventType
        Death() As EventType
    End Type
    
    Private Type DataType
        People() As PersonType
    End Type
    
    Private data As DataType
    
    Private Sub SaveData(pstrFile As String)
        Dim FileNumber As Long
        
        FileNumber = FreeFile()
        Open pstrFile For Binary As #FileNumber
        Put #FileNumber, 1, data
        Close
    End Sub
    
    Private Sub LoadData(pstrFile As String)
        Dim FileNumber As Long
        
        FileNumber = FreeFile()
        Open pstrFile For Binary As #FileNumber
        Get #FileNumber, 1, data
        Close
    End Sub
    This is actual, functional code for saving and loading any data structure you can imagine to and from the hard drive with no thought required as to how big dynamic arrays are, how much space strings take up, or anything. It's as elegant as it gets.

    There are several nuances I've learned the hard way that I'm happy to share, but that's essentially it.

  26. #26
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    The question is, what kind of data structure will serve the needs of the OP?

    My question remains: How many different kinds of main events are there? Not sub-events, like PLAC, but primary events, like BIRT. In other words, how many different "1"s are there?

    If there are only a handful, we can tailor a specific udt for each. If there are dozens, we'll want a single flexible data structure that can handle them all. So how many are there?

  27. #27
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: NEWBY Struggling with Conditional Text Merge Problem

    From the samples I've looked at nested UDTs might not cut it. The beast appears to be something closer to a redundantly serialized relational database. Even XML has no clean way to implement this since there are multiple possible hierarchies of entities. Different types of entities should have foreign-key links to other entity tables.

    An INDI may be a FAMC (child) of a given unique FAM, as well as being a FAMS (spouse) listed in his WIFE's FAM. He can also be a CHIL of another FAM, and a HUSB in his own FAM. Many INDIs could have been at the same EVEN, etc.

    So you have a number of one-to-one, one-to-many, many-to-one, and many-to-many relationships and the same INDI can be referenced within many other INDIs, etc. Then add in SOURs, PLACs, etc. etc.


    Like the nested UDT case, you can do a SHAPE query using the Data Shaping Service to return a hierarchical Recordset: a "Recordset of Recordsets." These are usually best displayed using a TreeView or a Hierarchical Grid. But those are only views of the data, you can't store all of the data in a simple tree structure.

    It appears that you can't even use a simple field for Birth Date, since there might be multiple reported dates of varying "quality" (and differing dates) just as with other attributes of an entity.

  28. #28

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Quote Originally Posted by Ellis Dee View Post
    My question remains: How many different kinds of main events are there? Not sub-events, like PLAC, but primary events, like BIRT. In other words, how many different "1"s are there?
    Just to clear up a few questions, firstly in the file I am dealing with there are 99 different Event Types, worst case I saw recently had a lot of user created Event Types but still the total was only ~240

    Quote Originally Posted by dilettante View Post
    An INDI may be a FAMC (child) of a given unique FAM, as well as being a FAMS (spouse) listed in his WIFE's FAM. He can also be a CHIL of another FAM, and a HUSB in his own FAM. Many INDIs could have been at the same EVEN, etc.
    Your are quite correct that the Family relationship defined after the Individual Block within the "0 @F" tags is a complex animal and I would not propose to deal with family events outside of the genealogy program itself, just too complicated and impossible to validate changes.

  29. #29
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Changes do sound like a challenge. You might not just have to edit the text or values in an entity, you might have to change its link to another entity to another type. You could have a daughter that turned out upon more information to be a son, a graduation that turned out to be something else, a given name that needs to be edited to be a surname, etc.

  30. #30
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Quote Originally Posted by Ellis Dee View Post
    In that case, here's a quick primer on UDTs. ...
    Ellis

    Sweet. Thanks ..

    EDIT:
    Due to their nature, udt arrays are self-documenting, unlike arrays where everything is an index and nothing makes sense without careful inspection.
    My workaround is to use "captioned" flexgrids.
    But your point is taken.

    Spoo
    Last edited by Spoo; Jul 3rd, 2010 at 08:09 PM.

  31. #31
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Quote Originally Posted by dilettante View Post
    From the samples I've looked at nested UDTs might not cut it. The beast appears to be something closer to a redundantly serialized relational database. Even XML has no clean way to implement this since there are multiple possible hierarchies of entities. Different types of entities should have foreign-key links to other entity tables.
    I tend to agree; the more I think about the data, the more I think it doesn't fit into a relational database particularly. Recursive data is best held in a flat file structure, IMO. The key to that is how you display it, because the data itself is mostly a mess. Treeviews (as you mention) are ideal for this.

  32. #32
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: NEWBY Struggling with Conditional Text Merge Problem

    But a TreeView can only hold a "view" of the data based on the query you want to make at a given time.

    I would think users might need to create trees on demand, for example asking about an individual exploring a "result tree" with that person at the root with branches taking you to his own info as well as ancestors, siblings, spouses (widows do remarry), children, etc. and their info. Trying to build it using a giant tree with Adam and Eve at the root would be unnavigable and the data on hand most likely has many branches appearing "out of nowhere" and marrying into families.

    Your data would probably tend to have other things in there too, not just one person's family tree but several. Or "orphaned" data you're holding on suspicion until you have additional evidence to link it to a given tree.

    The trees aren't simple either. Over several generations families might re-merge, producing a "braided" structure within the tree when "5th cousins" intermarry.


    It looks darned complicated.

  33. #33
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Not particularly. Putting it into a flat-file approach that can point back to itself in any number of ways is quite simple to set up. Then populating a treeview given an arbitrary starting point is equally simple.

    An exhaustive treeview-style display of all the data would be overly complex, if not outright impossible, for all the reasons you mention.

    A userdrawn approach might be better, and pretty cool. I can envision a picturebox drawing a stereotypical family tree type thing, where when you click on a name it gets centered and the tree grows out from it to the edges of the box and just stops. Limited data to display at any one time keeps it simple. Then you traverse the tree in any direction by clicking on a name, which in turns gets centered and the box redrawn, lather rinse repeat.

    (Userdrawn in concept only; dynamic label and line controls would be easier than physically drawing everything.)

    Maybe click a name to open a popup window with the detail of that person, ie birth date/place, etc..., and double-clicking centers the name for traversing the tree.
    Last edited by Ellis Dee; Jul 4th, 2010 at 01:27 AM.

  34. #34

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Unfortunately circumstances are preventing me moving forward on this project just now.

    I have saved all the posts and just wanted to thank everyone who took time to post their views and ideas in my time of need and ignorance.

    Whilst my original idea may have been very ambitious and difficult to achieve, I have come away with some very good ideas on how I can at least partly implicate this with a big gain for me. I also discovered WinMerge which is a great way for me to check my logic and that I am achieving the desired results.

    This forum and members are a great resource, I would like to think someday I could maybe give something back, who knows?

    Thx again

  35. #35
    New Member
    Join Date
    Mar 2007
    Posts
    13

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando -
    I'm new to this thread, so pardon me if you've already explained this.

    Why are you going through this tortuous exercise -- attempting to parse a GEDCOM file? GEDCOMs are meant to exchange data between databases. Your best bet is to import it into a good database program, such as Wholly Genes' TMG program, which can then export the whole thing to an Excel format or other formats that can be imported directly into Access or other database-- if that's what you want. It does it very painlessly, and will preserve any Sureties that have been entered.

    If you just want to see what you have, then use GenViewerLite, a good free program.

    ( http://www.mudcreeksoftware.com/genviewer_lite.htm )

  36. #36

    Thread Starter
    Junior Member
    Join Date
    May 2010
    Posts
    19

    Re: NEWBY Struggling with Conditional Text Merge Problem

    I use a program called Rootsmagic V4 which is a very good program, however most programs are stretching the Gedcom standard a little so even importing to another platform packed with utilities would involve losing information.

    Most genealogy programs I know of will not merge this type of duplication automatically as it may be valid disputed information still to be proved.

    Having imported Gedcom files from various other researchers and programs I have inherited load of wrongfully added information, if you are familiar with genealogy you will also be familiar with the problems like freehand notes in date fields, occupations and more freehand text examples in the Place field.

    Running through the gedcom I have been able to build a number of small routines that look for this wrongly entered data and either query it to screen or automatically fix it, as you can imagine this is tailored to what I see in front on me in the database concerned but things so far have been working very well.

    Identifying and merging duplicate events might be a bit ambitious at present but I know I will be able to mark progress here once my skill level improves and achieve a merge routine that merges at least some of these duplicates.

  37. #37
    New Member
    Join Date
    Mar 2007
    Posts
    13

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando:

    Understand your problem. I've been doing genealogy for some years and know what you're facing. There's absolutely no package that will map sloppily entered data to the proper fields, nor reconcile discrepancies. I've learned how to handle the problem more or less manually, person by person, before I merge any new data from others with what I have. Sometimes this is merely a matter of using TMG's People Merge which gives me a chance to combine or ignore data, but when all's said and done, it's necessary to go through person by person to evaluate the data, reconcile the discrepancies, and put the sources into proper format. Gedcoms are convenient but not a good tool, and I try to avoid them. Fortunately, TMG can import other programs' databases directly without going through the Gedcom route, but you're still faced with the problem of others' bad data entry and accuracy. Good luck.

  38. #38
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Lando

    6rtury makes some very good points, but it's all kinda
    greek to me as I have no GEDCOM experience at all.

    So, in the event that you are a glutton for punishment ( ) and
    are still interested in developing your own app, I spent a few
    moments to write some code that scans your file (post #22)
    and isolates all unique "1" tags -- in raw (original) order, and sorted.

    See attached image for results.
    (FWIW, they are presented in a MSHFlexGrid)

    I hope it is of some use to you.

    Spoo

    .
    Attached Images Attached Images  

  39. #39
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: NEWBY Struggling with Conditional Text Merge Problem

    Quote Originally Posted by Ellis Dee View Post
    The most powerful aspect of udts is that ...
    Ellis

    At the risk of hijacking this thread, I nonetheless tried
    the following per your post #25.

    1. In a .bas module

    Code:
    Public Type EventType
        Type As String
        Description As String
        Priority As String
    End Type
    
    Public Type PersonType
        Name() As EventType         ' << compile error -- expected expression
        Birth() As EventType
        Death() As EventType
    End Type
    
    Private Type DataType
        People() As PersonType
    End Type
    
    Private Data As DataType
    Changing Name() to Names() solves that issue.

    Did I do something wrong?
    Should Name() work?

    2. In a Form

    Code:
    Private Sub LoadData(pstrFile As String)
        Dim FileNumber As Long
        FileNumber = FreeFile()
        Open pstrFile For Binary As #FileNumber
        Get #FileNumber, 1, Data     ' << run-time error '458'
        Close
    End Sub
    .. where pstrFile = "D:\VBForum Stuff\Txt Files\lando geneo.txt"

    I get "Variable uses an Automation type not supported by Visual Basic"
    as the error description.

    What am I missing here?

    Spoo

  40. #40
    Addicted Member
    Join Date
    Oct 2009
    Posts
    164

    Re: NEWBY Struggling with Conditional Text Merge Problem

    My question is about the 'sorting' he wishes to do...

    is the 'date quality' basically a guage of how reliable or accurate the date may be?
    And you want to keep records with date reliability above a certain level?
    I also assume you always want to work with a whole individual at a time (not seperate BIRT records)
    Side question... was this data format originally (or still is) COBOL based? (I see no need to convert this file or move it to a DB just for your current problem. It is a completely valid and understandable format)

    please feel free to correct me if my understanding is wrong..
    but shouldnt this be much simpler than huge arrays and UDTs?
    Those methods seem to want to read the file all at once.

    How I would approach this:
    Read an individual into string, char by char.. use @ to mark start/end of individual (or group of char = 0 @I1)
    Work with string. the string containing a single individual, you CAN parse it into a UDT as Ellis suggests. If the problem was larger, I would use UDT, but I don't think it's needed for this
    Output the string to a specific file, based on if it meets your quality check. If good enough, write to FileA, if not good enough, write to FileB
    Read the next Individual.. etc etc

    Since the source file is used read-only, even if things go south, you won't loose anything by keeping the file open the whole time.

    As far as determining if a date is good enough, My thought is to assign point values to the areas of the date that improve its quality.
    1 for month, 1 for year, 2 for a Day, but only 1 if 'abt'
    Then you can sort depending if intTotalScore is > 3 (for example)

    the one downside to this is that the Day() function assume 1 if no day present, so you cant know it it is 1 or none... I'll get back to you if I think of a better way. I still suggest a point system though.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width