-
NEWBY Struggling with Conditional Text Merge Problem
I hope someone can take some time out of their busy schedules to point me in the right direction as to how I should tackle this project; my immature programming head is just not grasping it at present, maybe I am over thinking it! :confused:
I basically want to construct a condition merge routine for a text readable file. One field contains a date to which I would like to assign a “quality” factor, another field containing Place detail I would use the number of comma delimiters as an indication of quality, more being more components and therefore more detailed. An example portion of the original file structure is below, this is a gedcom file (genealogy information)
Code:
0 @I1@ INDI ‘ start of information for individual#1
..
1 BIRT
2 DATE 1868 ‘ third highest quality score date
2 PLAC New York, America
..
1 BIRT
2 DATE Mar 1868 ‘ second highest quality score date
1 BIRT
2 DATE 18 Mar 1868 ‘ highest quality score date
2 PLAC America
..
1 BIRT
2 DATE abt 1868 ‘ fourth highest quality score date
2 PLAC Sullivan County, NY, United States
..
0 @I2@ INDI ‘ start of information for individual#2
Constants are :-
“0 @I” starts each new person’s information block.
“1” starts each new event within the “0 @I” tags, in this case “1 BIRT” (birth)
The number of lines appearing between the “0 @I” tags and the “1 “ tags are totally variable. The “2 DATE” and “2 PLAC” may or may not appear.
Code:
1 BIRT
2 DATE 18 Mar 1868 ‘most accurate date and same year as others
2 PLAC Sullivan County, NY, United States
What I would like to work towards in this case is to reconstruct the information to one Birth Event with the most accurate information like the example above and discard the additional lesser quality information. Unfortunately at the moment I don’t have the experience to jump towards the right solution.
My thoughts were to read between the “0 @I” tags into fileINPUT array, reconstruct the information in fileOUTPUT array and write back to output file and also write changed information to a log file for auditing.
This is maybe far too ambitious for me at my stage of learning but no pain no gain, I believe once I get some experienced direction on how I should be tackling this problem and maybe a short example then I will start to achieve some real results.
In hope of your indulgence and a bit of patience from a family historian. (and newby programmer) :o
-
Re: NEWBY Struggling with Conditional Text Merge Problem
I would approach it like this:
1) Read in the entire contents of the file to a single string variable
2) Use the Split(<string>, vbNewLine) function to create an array of lines from the string variable
3) Iterate the array, one line at a time, parsing out the data as desired
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Thanks, I should have given some idea of the size of the file, 69Mb, individual wise there are over 100K, thats why I was goint to attack it chunk by chunk.
I must look more into the use of the SPLIT JOIN functions though.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
you can also split the file on "0 @I" so you would have an array with each person as an element
to score you can use like
vb Code:
' for dates
score = UBound(Split(Mid(mystr, 8)))
' or for places
score = UBound(Split(Mid(mystr, 8), ",")) + 1
where mystr is a date or place line, you would need to test if a delimiter exists to avoid error
something like
vb Code:
If InStr(mystr, "PLAC") > 0 Then
If InStr(mystr, ",") Then
score = UBound(Split(Mid(mystr, 8), ",")) + 1
ElseIf Len(mystr) < 9 Then score = 0
Else: score = 1
End If
End If
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
I was initially puzzled by why BIRT was always "blank", but
I think I get it now .. the following is a repeat of your example,
indents added for illustrative purposes.
Code:
1 0 @I1@ INDI ‘ start of information for individual#1
2 ..
3 1 BIRT
4 2 DATE 1868 ‘ third highest quality score date
5 2 PLAC New York, America
6 ..
7 1 BIRT
8 2 DATE Mar 1868 ‘ second highest quality score date
9 1 BIRT
10 2 DATE 18 Mar 1868 ‘ highest quality score date
11 2 PLAC America
12 ..
13 1 BIRT
14 2 DATE abt 1868 ‘ fourth highest quality score date
15 2 PLAC Sullivan County, NY, United States
16 ..
17 0 @I2@ INDI ‘ start of information for individual#2
It kinda makes more sense, visually, to me now.
- Raw source data
- Each "1 " tag represents a different different source regarding BIRT data.
- Most have a DATE, only some have a PLAC
- The ".." seems to represent where other "1 " tags could appear, possibly such as DEAT. Thus, for now, ignore.
- Output goal: for a given individual, "pick" best DATE and PLAC info
- DATE and PLAC do not necessarily need to be from same source
- DATE quality algo
- 1 .. 18 Mar 1868 --- full dd/mmm/yyyy
- 2 .. Mar 1868 --- mmm/yyyy
- 3 .. 1868 --- yyyy
- 4 .. abt 1868 --- qualified yyyy
- PLAC quality algo - count commas
- 1 .. [2] Sullivan County, NY, United States
- 2 .. [1] New York, America
- 3 .. [0] America
- 4 .. [?] <empty>
OK.. now, to cogitate for a bit.
Feel free to jump in and correct me where I'm wrong.
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
[*]Each "1 " tag represents a different different source regarding BIRT data.
CORRECT
[*]Most have a DATE, only some have a PLAC
Most have DATE and PLAC but not always.
[*]The ".." seems to represent where other "1 " tags could appear, possibly such as DEAT. Thus, for now, ignore.
Sorry for the confusion, the ".." is my separator, actually many more lines of "2 " and even "3 " might appear here. Each Birth, Death, Marriage, Occupation EVENT will always start with "1 ", and will always be immediately followed by DATE and then PLAC when they hold data, if the data does not exist then "2 DATE" or "2 PLAC" or both may not appear.
[*]Output goal: for a given individual, "pick" best DATE and PLAC info
CORRECT
[*]DATE and PLAC do not necessarily need to be from same source
CORRECT, goal would be to create one EVENT with best quality information from multiple EVENTS if they exist.
1 BIRT
2 DATE best quality date
3 PLAC best quality place
-
1 Attachment(s)
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
In addition to what has already been suggested,
here is an alternative. It is a rather "brute force"
approach that uses 3 arrays.
It will do the:
- parsing -- aaRAW()
- segmenting -- aaDATE() and aaPLAC (eg, keep all dates for INDI 1 in "1 place"
- it does not tackle the "rating" step yet ... baby steps .. :)
Code:
' 1. dump txt file into array
Dim aaRaw()
Dim aaDATE(), aaPLAC()
'
size1 = 13 ' possibly set to 1,000,000 - for aaRAW()
size2 = 4 ' possibly set to 100 - for aaDATE and aaPLAC 2nd dimension
fn = "D:\VBForum Stuff\Txt Files\lando geneo.txt"
Open fn For Input As #1
ee = 0
nid = 0
ReDim aaRaw(size1)
Do While Not EOF(1)
Line Input #1, aaRaw(ee)
nid = nid + IIf(Left(aaRaw(ee), 1) = "0", 1, 0) ' count INDI's, for aaDATE and aaPLAC 1st dimension
ee = ee + 1
Loop
Close #1
' 2. segment by individual
ReDim aaDATE(nid - 1, size2), aaPLAC(nid - 1, size2)
nn = -1
For ii = 0 To ee - 1
' incr INDI
If Left(aaRaw(ii), 1) = "0" Then
nn = nn + 1
dct = 0
pct = 0
End If
zz = Left(aaRaw(ii), 6)
' populate aaDATE for this INDI
If zz = "2 DATE" Then
aaDATE(nn, dct) = Mid(aaRaw(ii), 8)
dct = dct + 1
' populate aaPLAC for this INDI
ElseIf zz = "2 PLAC" Then
aaPLAC(nn, pct) = Mid(aaRaw(ii), 8)
pct = pct + 1
End If
Next ii
I hope this gives you some ideas. Holler if you have
any questions.
EDIT:
Just saw your post .. you beat me by 2 minutes !
I think we're on the same page. More later.
Time for Letterman,
Attached image shows:
- aaRAW()
- aaDATE() - aaDATE(0) contains all dates for INDI 1
- aaPLAC() - aaPLAC(0) contains all places for INDI 1
Spoo
.
-
1 Attachment(s)
Re: NEWBY Struggling with Conditional Text Merge Problem
Thanks, time for bed here, just coming off night shift...
I have attached an example gedcom with only one individual with various duplications, this is what I am trying to erradicate and I'm sure you will get the idea of how the tags are.
Each EVENT runs from "1 " tag to next "1 " tag e.g BIRT is the identifier. I am primarily only interested in the 3 or 4 lines after the EVENT start (BIRT) in this example.
Thx and night night zzzzzzzzzzzzz
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
I tried to look at your attached file, but nothing happens.
That is, it turns into a .GED file, but when I try to move
it from zip folder to another folder (by dragging), nothing
appears. Perhaps you could make a copy of it and rename
it using a .TXT extension.
Spoo
-
1 Attachment(s)
Re: NEWBY Struggling with Conditional Text Merge Problem
Morning, I have just recreated another Gedcom file as txt and attached it, I have split it to help explanation and annotated it.
This sort of thing might be a piece of cake to you but although I could fix it with copy&paste (although it would take a while to work through 100k records), the programming is a bit beyond me at present. :o
So far I have been able to sequentially read the input file, look for example "2 PLAC" modify and output again, then I moved to the first 3 lines of any event but this is a quamtum leap for me.
I am sure when I get the fundamentals of how I should be approaching this I can work to understand it and tailor things to my own needs.
Thanks for your time and consideration, I really do appreciate it.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
OK, thanks, that one I can read just fine.
But first, some questions:
- Where do you stand regarding your OP (original post #1)?
- still have questions
- solved
- solved but now going to the "next" level per attached file
- Which approach(es) did you adopt?
- Ellis Dee's -- post #2
- Westconn's -- post #4
- mine -- post #7
- Did you manage to read the entire 69 Meg file using any of the approaches?
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
As you can see now my original post was very truncated information but still did contain the important fields which start any new event.
I have not adopted any approaches yet as regards this problem. Ellis Dee's approach would have been fine on the example I gave but to me this is a much bigger task, may not, just ignorance on my part!
Westconn1's approach was what my immature programming mind was originally thinking, read each person between the "0 @" tags into an array, that still leaves me the problem of how to reconcile the info which would not fall within me experience to date.
Yours is looking good although I would never have thought in array dimensions of this size (ignorance again eh!) but another thing I have learnt.
The small routines I have been creating were working on only 1 or maybe a set of 3 consecutive lines, example some PLAC lines contains “Belfast vol 1 page 123 Sep Qtr” and the Date would contain year only. In this case I would look for PLAC containing Sep Qtr with Instr and amend the year from "2 DATE 1888" to "2 DATE SEP Q 1888" making it more accurate and this is a valid date format as recognized by my genealogy program.
I'm afraid that is my level at present although I am learning all the time from reading this forum and the contributions from guys like you. I don' expect to ever be an expert programmer but I know there is a lot more I can learn.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
Quote:
... although I would never have thought in array dimensions of this size ...
Haha .. If you are referring to my 1,000,000 element
array, well, I did say "brute force" method, didn't I?
Let's just take a moment to explore that.
I'd say that there are 4 basic approaches:
- undersized -- will get Run-time error '9' - Subscript out of range if array is too small
- massively oversize -- to avoid such a run-time error (my 1,000,000 idea)
- use ReDim Preserve each time -- efficient (RAM-wise), but slows things down
- use ReDim Preserve at intervals (say 5000) -- efficient and fast
ReDim (without the Preserve keyword) wipes out any existing
contents of the array.
- If you are doing this at the outset, then no problems
- If you are doing this when the array does contain data, big problem; hence the use of Preserve.
There is one caveate regarding ReDim Preserve: you can only
increase the upper bound of the last dimension
So, if you have a 1-D array (such as proposed aaRaw) -- no issues
But, if you have a 2-D array (such as proposed aaDATE) -- potential issue !!
We can explore the "potential issue" thingy further, if you are interested.
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
You see I read a lot of stuff here where folks are optimizing for speed, for me it’s not really an issue as this will be an infrequently run routine. All I want to achieve is a routine I can understand, tweak and work with.
I had read about ReDim and Redim Preserve but have not thought about 2-D arrays yet and did not understand the potential issues.
For this type of project do your believe the 2-D array approach is the best, I suppose since this is a very infrequent run, I could read through the file on a pre run and collect some data to help dim the arrays?
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Spoo
I am in work at present and still playing catchup, I read your post#7 again with a clearer head and it is very interesting although I suppose we now have the extra element of various EVENT types also so another array?.
I was not familiar with the IIF operator (ignorance again, who said it was Bliss?) but now I can see how it works.
Obviously being in work I can't be trying any of this but I will do when I go off shift tomorrow and get some sleep.
BTW, I am on the right coast also, just not your right coast :D
-
Re: NEWBY Struggling with Conditional Text Merge Problem
This is the type of thing that screams for a UDT array instead of a two-dimensional array.
How many different kinds of data types are there?
(And no, you don't want to read the whole 69mb file in at once as I originally suggested. After a few megs you're better off reading sequentially.)
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Ellis Dee, you will see an example of the file structure attached to post#10.
For someone at my stage in VB it's melting my head, although not as much as it was yesterday, so that's good. :)
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Quote:
Originally Posted by
Lando
You see I read a lot of stuff here where folks are optimizing for speed, for me it’s not really an issue as this will be an infrequently run routine. All I want to achieve is a routine I can understand, tweak and work with.
I had read about ReDim and Redim Preserve but have not thought about 2-D arrays yet and did not understand the potential issues.
For this type of project do your believe the 2-D array approach is the best, I suppose since this is a very infrequent run, I could read through the file on a pre run and collect some data to help dim the arrays?
Point taken about infrequently run app.
As for is "the 2-D array approach best" .. if you were thinking of using
my approach, keep these points in mind:
- aaRaw() -- the huge array -- it is proposed to be only 1-D.
- aaDATE() and aaPLAC() -- these were proposed to be 2-D
- my "way" of thinking about arrays is to think of an Excel spreadsheet (as in the R1C1 reference style)
- this is definitely not the ONLY way .. flipping things 90 degrees has its strong points
- anyway, using the R1C1 approach, in the attached image in my post #7, I proposed:
- 1st dim is rows -- one per INDI (ie, per individual)
- 2nd dim is cols -- one per source
- if you do a complete "read" of the 69 Meg data file per my step 1, you will know number of IND's
- hence you can specify 'number of INDIs' as 1st dim, and it won't change
- 2nd dim can be ReDim Preserved as needed, with no harmful effects.
Hope that isn't going overboard !
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
Reading Ellis's post #16 and your reply in #17, it appears that
you are now going down a different route, seeing as he and I
are proposing differing approaches.
I'll back off, therefore, to avoid confusion.
Holler if you want me to continue.
EDIT:
FWIW, I've used the "brute force" approach on 90 Meg CSV files
consisting of 2,000,000 lines of data. Piece of cake. Takes around
20 seconds to go through the entire file.
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
This is one horrendous, nasty file format for something meant to be for machine to machine data exchange. Worse yet they appear to often be encoded using some funky proprietary character encoding instead of ASCII or Unicode from what I've read.
You might try searching for one of several free format converter utilities out there. You'll be far ahead after first converting this into a Jet MDB or some other database format. Avoid XML or the darned thing will just get enormous on you.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Quote:
Originally Posted by
Lando
Ellis Dee, you will see an example of the file structure attached to post#10.
Yes, downloading that is what prompted my question. I see a bunch of data types in that sample data snippet. I was wondering if there were an exhaustive list of data types?
dilettante's idea to look for a file converter (preferably to Access) is solid. That could help simplify this process by a fair amount.
Apart from an off-the-shelf converter, a raw data converter isn't particularly hard to write from scratch. It's the next step of massaging that data where it gets tricky.
-
1 Attachment(s)
Re: NEWBY Struggling with Conditional Text Merge Problem
Spoo, please come back I did not intend to change any direction, I thought maybe with the different elements that the UDT array was logical. :o
dilettante, this is the gedcom standard between genealogy programs and not what is used day to day, just an easier format for me to work with.
Ellis Dee, I knew massaging the data was going to be tricky but once I get some kind of basis to work on that I understand I can work with the logic myself.
All, thanks for taking time to work with me, I have attached another txt file which maybe helps explain better what I am trying to achieve for, in this instance, the Birth Event.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
LOL .. I'm back !!
Not being one who has ever worked with UDTs, I stand
to learn something about them along with you !
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
In that case, here's a quick primer on UDTs. I'll use generic data not specific to this project because I still don't see how it all fits together yet.
Think of a UDT array like a database table or a spreadsheet. Each element in the array acts as one row. Each piece of that row can be typed differently and strongly, unlike a with a two-dimensional array where either every piece of data must be the same type or every piece of data must be a Variant. Due to their nature, udt arrays are self-documenting, unlike arrays where everything is an index and nothing makes sense without careful inspection.
That's the basics, and even that alone is quite powerful. It gets better, though. Much better. But here's an example of how a simpe udt array might hold, say, employee info.
First you put the udt definition top of a module like you would constants. These are often public definitions, and if so they must go in a bas module. They can, however, be private inside forms or classes. In that case, the form or class must encapsulate them completely. In other words, no other module/form/class can directly reference the udt definition in any way. (You can't pass one as a parameter to a function in the class or form.)
Code:
Option Explicit
Public Type EmployeeType
EmployeeID As Long
Name As String
Title As String
Phone As String
Email As String
End Type
Once you have your definition, you can now declare variables (or arrays, usually) of that type. The individual elements in the udt act just like properties.
Code:
Dim typEmployee As EmployeeType
typEmployee.EmployeeID = 1234
typEmployee.Name = "John Smith"
typEmployee.Title = "Drone"
The With...End With operator can be quite useful, but avoid prematurely leaving a With...End With block using GoTo, Exit Do, or Exit For.
Defining a udt as a dynamic array lets you turn it into an honest-to-god table, right in memory.
Code:
Dim typEmplyee() As EmployeeType
ReDim Preserve typEmployee(100)
For i = 0 To 100
With typEmployee(i)
Debug.Print .EmployeeID & ": " & .Name
End With
Next
Note that udts free you from having to worry about reserved words like Name. And as a general rule I never use prefixes in udt elements, just like I don't use prefixes when I name fields in databases.
Now we have the means to create an entire relational database, with a separate udt array for each table and each table having an ID field we can search on. Sorting must be done manually; I usually use Combsort (link in signature) for sorting udt arrays because that algorithm is efficient and easy to adapt.
More in next post.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
The most powerful aspect of udts is that they can contain other udts, including dynamic udt arrays. Consider the following:
Code:
Option Explicit
Public Type EventType
Type As String
Description As String
Priority As Long
End Type
Public Type PersonType
Name() As EventType
Birth() As EventType
Death() As EventType
End Type
Private mtypPeople() As PersonType
This rudimentary (but flexible!) structure would allow you to add people to the People array, and each individual person would get their own redim-able Name event array, Birth event array, and Death event array. Each of those event arrays can hold any number of different types of events. For example, the Birth array can hold places, dates, whatever. And each of those can be assigned their own priority, allowing you to massage however you'd like.
The most powerful aspect of udts is that if you have a non-array udt, you can save it to a file on disk with a single line of code, and read it back from disk into memory with another single line of code. Because udts can contain other udts, what I do whenever I want to persist udts is dump them all into a "main" udt called Data. Then saving/loading all the memory at once is a simple matter of saving/loading Data. For example:
Code:
Option Explicit
Public Type EventType
Type As String
Description As String
Priority As String
End Type
Public Type PersonType
Name() As EventType
Birth() As EventType
Death() As EventType
End Type
Private Type DataType
People() As PersonType
End Type
Private data As DataType
Private Sub SaveData(pstrFile As String)
Dim FileNumber As Long
FileNumber = FreeFile()
Open pstrFile For Binary As #FileNumber
Put #FileNumber, 1, data
Close
End Sub
Private Sub LoadData(pstrFile As String)
Dim FileNumber As Long
FileNumber = FreeFile()
Open pstrFile For Binary As #FileNumber
Get #FileNumber, 1, data
Close
End Sub
This is actual, functional code for saving and loading any data structure you can imagine to and from the hard drive with no thought required as to how big dynamic arrays are, how much space strings take up, or anything. It's as elegant as it gets.
There are several nuances I've learned the hard way that I'm happy to share, but that's essentially it.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
The question is, what kind of data structure will serve the needs of the OP?
My question remains: How many different kinds of main events are there? Not sub-events, like PLAC, but primary events, like BIRT. In other words, how many different "1"s are there?
If there are only a handful, we can tailor a specific udt for each. If there are dozens, we'll want a single flexible data structure that can handle them all. So how many are there?
-
Re: NEWBY Struggling with Conditional Text Merge Problem
From the samples I've looked at nested UDTs might not cut it. The beast appears to be something closer to a redundantly serialized relational database. Even XML has no clean way to implement this since there are multiple possible hierarchies of entities. Different types of entities should have foreign-key links to other entity tables.
An INDI may be a FAMC (child) of a given unique FAM, as well as being a FAMS (spouse) listed in his WIFE's FAM. He can also be a CHIL of another FAM, and a HUSB in his own FAM. Many INDIs could have been at the same EVEN, etc.
So you have a number of one-to-one, one-to-many, many-to-one, and many-to-many relationships and the same INDI can be referenced within many other INDIs, etc. Then add in SOURs, PLACs, etc. etc.
Like the nested UDT case, you can do a SHAPE query using the Data Shaping Service to return a hierarchical Recordset: a "Recordset of Recordsets." These are usually best displayed using a TreeView or a Hierarchical Grid. But those are only views of the data, you can't store all of the data in a simple tree structure.
It appears that you can't even use a simple field for Birth Date, since there might be multiple reported dates of varying "quality" (and differing dates) just as with other attributes of an entity.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Quote:
Originally Posted by
Ellis Dee
My question remains: How many different kinds of main events are there? Not sub-events, like PLAC, but primary events, like BIRT. In other words, how many different "1"s are there?
Just to clear up a few questions, firstly in the file I am dealing with there are 99 different Event Types, worst case I saw recently had a lot of user created Event Types but still the total was only ~240
Quote:
Originally Posted by
dilettante
An INDI may be a FAMC (child) of a given unique FAM, as well as being a FAMS (spouse) listed in his WIFE's FAM. He can also be a CHIL of another FAM, and a HUSB in his own FAM. Many INDIs could have been at the same EVEN, etc.
Your are quite correct that the Family relationship defined after the Individual Block within the "0 @F" tags is a complex animal and I would not propose to deal with family events outside of the genealogy program itself, just too complicated and impossible to validate changes.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Changes do sound like a challenge. You might not just have to edit the text or values in an entity, you might have to change its link to another entity to another type. You could have a daughter that turned out upon more information to be a son, a graduation that turned out to be something else, a given name that needs to be edited to be a surname, etc.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Quote:
Originally Posted by
Ellis Dee
In that case, here's a quick primer on UDTs. ...
Ellis
Sweet. Thanks .. :thumb:
EDIT:
Quote:
Due to their nature, udt arrays are self-documenting, unlike arrays where everything is an index and nothing makes sense without careful inspection.
My workaround is to use "captioned" flexgrids.
But your point is taken.
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Quote:
Originally Posted by
dilettante
From the samples I've looked at nested UDTs might not cut it. The beast appears to be something closer to a redundantly serialized relational database. Even XML has no clean way to implement this since there are multiple possible hierarchies of entities. Different types of entities should have foreign-key links to other entity tables.
I tend to agree; the more I think about the data, the more I think it doesn't fit into a relational database particularly. Recursive data is best held in a flat file structure, IMO. The key to that is how you display it, because the data itself is mostly a mess. Treeviews (as you mention) are ideal for this.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
But a TreeView can only hold a "view" of the data based on the query you want to make at a given time.
I would think users might need to create trees on demand, for example asking about an individual exploring a "result tree" with that person at the root with branches taking you to his own info as well as ancestors, siblings, spouses (widows do remarry), children, etc. and their info. Trying to build it using a giant tree with Adam and Eve at the root would be unnavigable and the data on hand most likely has many branches appearing "out of nowhere" and marrying into families.
Your data would probably tend to have other things in there too, not just one person's family tree but several. Or "orphaned" data you're holding on suspicion until you have additional evidence to link it to a given tree.
The trees aren't simple either. Over several generations families might re-merge, producing a "braided" structure within the tree when "5th cousins" intermarry.
It looks darned complicated.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Not particularly. Putting it into a flat-file approach that can point back to itself in any number of ways is quite simple to set up. Then populating a treeview given an arbitrary starting point is equally simple.
An exhaustive treeview-style display of all the data would be overly complex, if not outright impossible, for all the reasons you mention.
A userdrawn approach might be better, and pretty cool. I can envision a picturebox drawing a stereotypical family tree type thing, where when you click on a name it gets centered and the tree grows out from it to the edges of the box and just stops. Limited data to display at any one time keeps it simple. Then you traverse the tree in any direction by clicking on a name, which in turns gets centered and the box redrawn, lather rinse repeat.
(Userdrawn in concept only; dynamic label and line controls would be easier than physically drawing everything.)
Maybe click a name to open a popup window with the detail of that person, ie birth date/place, etc..., and double-clicking centers the name for traversing the tree.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Unfortunately circumstances are preventing me moving forward on this project just now.
I have saved all the posts and just wanted to thank everyone who took time to post their views and ideas in my time of need and ignorance.
Whilst my original idea may have been very ambitious and difficult to achieve, I have come away with some very good ideas on how I can at least partly implicate this with a big gain for me. I also discovered WinMerge which is a great way for me to check my logic and that I am achieving the desired results.
This forum and members are a great resource, I would like to think someday I could maybe give something back, who knows?
Thx again
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando -
I'm new to this thread, so pardon me if you've already explained this.
Why are you going through this tortuous exercise -- attempting to parse a GEDCOM file? GEDCOMs are meant to exchange data between databases. Your best bet is to import it into a good database program, such as Wholly Genes' TMG program, which can then export the whole thing to an Excel format or other formats that can be imported directly into Access or other database-- if that's what you want. It does it very painlessly, and will preserve any Sureties that have been entered.
If you just want to see what you have, then use GenViewerLite, a good free program.
( http://www.mudcreeksoftware.com/genviewer_lite.htm )
-
Re: NEWBY Struggling with Conditional Text Merge Problem
I use a program called Rootsmagic V4 which is a very good program, however most programs are stretching the Gedcom standard a little so even importing to another platform packed with utilities would involve losing information.
Most genealogy programs I know of will not merge this type of duplication automatically as it may be valid disputed information still to be proved.
Having imported Gedcom files from various other researchers and programs I have inherited load of wrongfully added information, if you are familiar with genealogy you will also be familiar with the problems like freehand notes in date fields, occupations and more freehand text examples in the Place field.
Running through the gedcom I have been able to build a number of small routines that look for this wrongly entered data and either query it to screen or automatically fix it, as you can imagine this is tailored to what I see in front on me in the database concerned but things so far have been working very well.
Identifying and merging duplicate events might be a bit ambitious at present but I know I will be able to mark progress here once my skill level improves and achieve a merge routine that merges at least some of these duplicates.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando:
Understand your problem. I've been doing genealogy for some years and know what you're facing. There's absolutely no package that will map sloppily entered data to the proper fields, nor reconcile discrepancies. I've learned how to handle the problem more or less manually, person by person, before I merge any new data from others with what I have. Sometimes this is merely a matter of using TMG's People Merge which gives me a chance to combine or ignore data, but when all's said and done, it's necessary to go through person by person to evaluate the data, reconcile the discrepancies, and put the sources into proper format. Gedcoms are convenient but not a good tool, and I try to avoid them. Fortunately, TMG can import other programs' databases directly without going through the Gedcom route, but you're still faced with the problem of others' bad data entry and accuracy. Good luck.
-
1 Attachment(s)
Re: NEWBY Struggling with Conditional Text Merge Problem
Lando
6rtury makes some very good points, but it's all kinda
greek to me as I have no GEDCOM experience at all.
So, in the event that you are a glutton for punishment ( :) ) and
are still interested in developing your own app, I spent a few
moments to write some code that scans your file (post #22)
and isolates all unique "1" tags -- in raw (original) order, and sorted.
See attached image for results.
(FWIW, they are presented in a MSHFlexGrid)
I hope it is of some use to you.
Spoo
.
-
Re: NEWBY Struggling with Conditional Text Merge Problem
Quote:
Originally Posted by
Ellis Dee
The most powerful aspect of udts is that ...
Ellis
At the risk of hijacking this thread, I nonetheless tried
the following per your post #25.
1. In a .bas module
Code:
Public Type EventType
Type As String
Description As String
Priority As String
End Type
Public Type PersonType
Name() As EventType ' << compile error -- expected expression
Birth() As EventType
Death() As EventType
End Type
Private Type DataType
People() As PersonType
End Type
Private Data As DataType
Changing Name() to Names() solves that issue.
Did I do something wrong?
Should Name() work?
2. In a Form
Code:
Private Sub LoadData(pstrFile As String)
Dim FileNumber As Long
FileNumber = FreeFile()
Open pstrFile For Binary As #FileNumber
Get #FileNumber, 1, Data ' << run-time error '458'
Close
End Sub
.. where pstrFile = "D:\VBForum Stuff\Txt Files\lando geneo.txt"
I get "Variable uses an Automation type not supported by Visual Basic"
as the error description.
What am I missing here?
Spoo
-
Re: NEWBY Struggling with Conditional Text Merge Problem
My question is about the 'sorting' he wishes to do...
is the 'date quality' basically a guage of how reliable or accurate the date may be?
And you want to keep records with date reliability above a certain level?
I also assume you always want to work with a whole individual at a time (not seperate BIRT records)
Side question... was this data format originally (or still is) COBOL based? (I see no need to convert this file or move it to a DB just for your current problem. It is a completely valid and understandable format)
please feel free to correct me if my understanding is wrong..
but shouldnt this be much simpler than huge arrays and UDTs?
Those methods seem to want to read the file all at once.
How I would approach this:
Read an individual into string, char by char.. use @ to mark start/end of individual (or group of char = 0 @I1)
Work with string. the string containing a single individual, you CAN parse it into a UDT as Ellis suggests. If the problem was larger, I would use UDT, but I don't think it's needed for this
Output the string to a specific file, based on if it meets your quality check. If good enough, write to FileA, if not good enough, write to FileB
Read the next Individual.. etc etc
Since the source file is used read-only, even if things go south, you won't loose anything by keeping the file open the whole time.
As far as determining if a date is good enough, My thought is to assign point values to the areas of the date that improve its quality.
1 for month, 1 for year, 2 for a Day, but only 1 if 'abt'
Then you can sort depending if intTotalScore is > 3 (for example)
the one downside to this is that the Day() function assume 1 if no day present, so you cant know it it is 1 or none... I'll get back to you if I think of a better way. I still suggest a point system though.