Results 1 to 10 of 10

Thread: Algorithm to merge files

  1. #1

    Thread Starter
    Fanatic Member mutley's Avatar
    Join Date
    Apr 2000
    Location
    Sao Paulo - Brazil
    Posts
    709

    Question Algorithm to merge files

    Hi

    I would like build some algorithm to join records of the textfile when the difference between two records is is less than 2

    characteristics of records

    1) The size is always 14 characters
    2) The possible characters are: (1,2,3,4,5,6,7)
    3) Usually has only the characters: 1.2 and / or 4

    Should I read the file in sequential order, and find for each record all possible records that have difference of 1 or two characters
    example of records
    [pre]
    42444442414411
    42444441414211
    42444441214411
    42444421414411
    42424441414411
    42244441414411
    41444442424411
    41444442414421
    41444442414412

    [/pre]
    When found some record less than 2 minimum difference, should add up the numbers that are different and eliminate one of the records
    Example :
    42444442414411 and 42444441414211

    The two records merge into:42444443414611


    As this record has already been summed, it is discarded in the next comparisons

    What Is the best way to do It reading a text file ?
    Last edited by mutley; Mar 27th, 2015 at 01:11 PM.

  2. #2
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Algorithm to merge files

    what happens if the 2 numbers that are different sum to a value > 7 or > 10?

    Can these records that will be compared exist anywhere in the file or only next to each other in the file?
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  3. #3
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Algorithm to merge files

    So what would need to happen after you merge the two, would it continue to compare the original for all the lines in the file? Would it add to that new number? Can you have 8,9 and 0 digits in the output?

    A little more detail please.

  4. #4

    Thread Starter
    Fanatic Member mutley's Avatar
    Join Date
    Apr 2000
    Location
    Sao Paulo - Brazil
    Posts
    709

    Re: Algorithm to merge files

    Quote Originally Posted by LaVolpe View Post
    what happens if the 2 numbers that are different sum to a value > 7 or > 10?

    Can these records that will be compared exist anywhere in the file or only next to each other in the file?
    Thank you for your answer

    But will never be greater than 7. generally only contain numbers: 1, 2 and 4

    suppose I have a text file, I get to read it in a sequential manner, I read the first record and seeking the first record that contains less difference than or equal 2, we gather these records and record it in another file and adding'll

    42444442414411
    42444441414211
    42444441214411
    42444421414411
    42424441414411
    42244441414411
    41444442424411
    41444442414421
    41444442414412

    42444442414411 First record
    42444441414211 second record

    I merge to 42444443414611 save in other file, then I must to read the third record , because the First was read and the second was merged .suppose it was not the second record that had less than or equal to two difference, but the fifth, then this fifth record should be disposed of close readings

  5. #5
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Algorithm to merge files

    Where DM and I are a bit confused, don't quite understand is that you said the array can contain numbers from 1 to 7, so theoretically, 42444442414411 & 42444442414611 would yield a sum of 10. Also DM was asking how do you know that you are not comparing a record that was already merged? When the record is merged and saved to another file, are unmerged records also saved to that file? Are you trying to merge multiple files too?
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  6. #6

    Thread Starter
    Fanatic Member mutley's Avatar
    Join Date
    Apr 2000
    Location
    Sao Paulo - Brazil
    Posts
    709

    Re: Algorithm to merge files

    Quote Originally Posted by LaVolpe View Post
    Where DM and I are a bit confused, don't quite understand is that you said the array can contain numbers from 1 to 7, so theoretically, 42444442414411 & 42444442414611 would yield a sum of 10. Also DM was asking how do you know that you are not comparing a record that was already merged? When the record is merged and saved to another file, are unmerged records also saved to that file? Are you trying to merge multiple files too?
    Show the two records below
    42444442414411
    42444441414211

    The difference is only eighth character and 12 twelfth , then the sum will be (2+1) and (4+2)
    42444443414611

    The Original file almost always have only: 1,2,4 if you have other, will be the exception, which is easy to circumvent manually

  7. #7
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: Algorithm to merge files

    I still don't know what you are really trying to do. I am assuming that your file has a lot more than just 2 records in it but all your examples are only dealing with two records, can't tell how to process the file based on that limited info.

  8. #8
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    6,582

    Re: Algorithm to merge files

    My interpretation.
    Start with a list of strings containing the digits 1,2 and 4. If there are any other digits in the string, they won't match any other strings within two characters so don't need to be concerned with them.

    Read first item from the list
    Compare to each of the following items in the list until you find one that differs by only one or two characters.
    Add the digits that are different together to create a new number and write that to the output file.
    Remove the two items combined from the first list.
    Start again with the first item in the list.
    If you make it through the list with no close matches to the first number then
    start the process over again, but starting with the second item in the first list.
    Repeat the above until your starting item from the first list is your last item in the first list. You've removed all closely matching pairs.

    Based on that you should have a second file with some number of merged pairs from the first list (the merged pairs removed from the first list), with the first list being all the remaining values where all the numbers differ by more than two characters with any other number in that list.

    Why? Who knows. It smacks of a compression scheme.
    Last edited by passel; Mar 27th, 2015 at 04:29 PM.

  9. #9

    Thread Starter
    Fanatic Member mutley's Avatar
    Join Date
    Apr 2000
    Location
    Sao Paulo - Brazil
    Posts
    709

    Re: Algorithm to merge files

    Quote Originally Posted by DataMiser View Post
    I still don't know what you are really trying to do. I am assuming that your file has a lot more than just 2 records in it but all your examples are only dealing with two records, can't tell how to process the file based on that limited info.
    Thank you , work like bitwise operations

    1 ==> 001
    2 ==>010
    4 == > 100

    Then using OR Operators
    1 OR 2 ==> 011 equal 3
    2 OR 4 ==> 110 equal 6

    My problem is How can I to read a text file and found all string with difference 2 and after choice any record , save the merge and to continue reading text file despising the records joined, because no use database , only datafile

  10. #10
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    6,582

    Re: Algorithm to merge files

    How big is the file?
    The best speed will be accomplished if you can read the whole list in memory.
    For a similar task, I created a second array that can act like a linked list to the array of strings.
    The access may be initially slower because of the indirect indexing to get to the item in the array, but as items are merged and "removed" from the link list, things should speed up as you don't have to read multiple flags to find the next non-merged value (if you used a flag to mark an entry that has been merged), or time spent compacting the list to eliminate merged fields.

    If the file is too large to fit into memory, are the lines really consistently sized so that the beginning of each line is guaranteed to be calculable by simply multiplying a fixed value by index to get to the line?

    Above all other questions, re-reading this thread multiple times, I'm still not sure that my interpretation in post #8 is correct. The question is what Datamiser asked in post #3,
    So what would need to happen after you merge the two, would it continue to compare the original for all the lines in the file?...
    and still wasn't really addressed in your response in post #4,
    42444442414411 First record
    42444441414211 second record

    I merge to 42444443414611 save in other file, then I must to read the third record , because the First was read and the second was merged .
    You say you must read the third in the case that the 1st was merged with the second and the second removed, but it isn't clear if you're reading the third to compare to the first. Or reading the third as the new "first" to compare to the rest of the file until a close match is found.

    ReQuestion:
    So, is the desired to compare the first item to all the remaining items in the file, merging it with all close matches and writing those matches in order to a second file, removing them (the matched entry) from the first file, and only after all close matches to the first item has been found, merged and removed, do you move on to the next item (remaining) in the first file and compare it to all the following (remaining) items in the first file, (merging and removing from the first file, and appending the merge to the second file).

    Apparently the original order isn't important in the second file, as you would be shuffling all close matches to the first item to the front of the second file, and then all close matches to the next (more than two differences to the previous entry in the first file), item and moving close matches to follow as a group in the second file. Seems like an odd thing.

    I guess, first answer the "Requestion:" above so we know whether post #8 is a wrong interpretation of the requirements.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width