Results 1 to 4 of 4

Thread: An Algorithm... finding duplicates in a file...please help!

  1. #1

    Thread Starter
    Member
    Join Date
    Mar 2000
    Posts
    47
    Hi!

    I have VB6.0 in Win98. I'm trying to write an algorithm that opens some file (text or binary...) and finds all the duplicate strings in it and lists them in a listbox. I want also that program lists also the corresponding locations of the duplicate strings in this file..this is quite hard but that's why I'm writing my question here...to have anwers from experts.

    Any ideas are welcome...an please ask also further details...

    RiebBo

    P.S. The algorithm should ignore the space hits but remember them when counting the locations


  2. #2
    Guest

    More Info...

    When you say "duplicate strings", you mean strings of complete words (not the hon from "phone" and "honest"), right?

  3. #3
    Fanatic Member
    Join Date
    Feb 2000
    Location
    Japan
    Posts
    840
    I can think of a lot of slow ways of doing this and only one 'so-so' way.

    load the text into an array of words, sort them and compare neighbours in the array. when you have the matched words, do a quick search of the origional to get the positions.

    binary... ouch. there are no defined strings or words, you'd have to compare every length from one to half the byte array length with a copy of the list, very slow.
    If you're after text strings then you could uses non letter ascii codes as delimiters and search the bits.

    how specific is this?

    does "at" match "sat" as a two letter text string?

    Paul Dwyer
    Network Engineer
    Aussie In Tokyo

    Using Powerbasic 6 & VB6 SP4 (Please also add your VB Version to your signature!)

  4. #4
    Member
    Join Date
    Dec 1999
    Posts
    41
    Hey,
    Here is what u can do..Create a word.basic object. open that document in word.Use the Find & Replacement objects exposed by word in VB program...Make sure that u work on the copy of TXT file u wish 2 find those duplicates...After finding a word. u can replace occurrences of that word with ""(null) as u dont want 2 count them again.I m pasting contents of HELP file..
    *********************************************************

    FIND Object
    ------------


    Represents the criteria for a find operation. The properties and methods of the Find object correspond to the options in the Find and Replace dialog box.

    Using the Find Object

    Use the Find property to return a Find object. The following example finds and selects the next occurrence of the word "hi."

    With Selection.Find
    .ClearFormatting
    .Text = "hi"
    .Execute Forward:=True
    End With

    The following example finds all occurrences of the word "hi" in the active document and replaces the word with "hello."

    Set myRange = ActiveDocument.Content
    myRange.Find.Execute FindText:="hi", ReplaceWith:="hello", _
    Replace:=wdReplaceAll

    Remarks

    If you've gotten to the Find object from the Selection object, the selection is changed when text matching the find criteria is found. The following example selects the next occurrence of the word "blue."

    Selection.Find.Execute FindText:="blue", Forward:=True

    If you've gotten to the Find object from the Range object, the selection isn't changed when text matching the find criteria is found, but the Range object is redefined. The following example locates the first occurrence of the word "blue" in the active document. If "blue" is found in the document, myRange is redefined and bold formatting is applied to "blue."

    Set myRange = ActiveDocument.Content
    myRange.Find.Execute FindText:="blue", Forward:=True
    If myRange.Find.Found = True Then myRange.Bold = True






    Finding and replacing is exposed by the Find and Replacement objects. The Find object is available from the Selection and Range object. The find action differs slightly depending upon whether you access the Find object from the Selection or Range object.

    Finding text and selecting it

    If the Find object is accessed from the Selection object, the selection is changed when the find criteria is found. The following example selects the next occurrence of the word "Hello." If the end of the document is reached before the word "Hello" is found, the search is stopped.

    With Selection.Find
    .Forward = True
    .Wrap = wdFindStop
    .Text = "Hello"
    .Execute
    End With

    The Find object includes properties that relate to the options in the Find and Replace dialog box (choose Find from the Edit menu). You can set the individual properties of the Find object or use arguments with the Execute method as shown in the following example.

    Selection.Find.Execute FindText:="Hello", Forward:=True, Wrap:=wdFindStop

    Finding text without changing the selection

    If the Find object is accessed from a Range object, the selection is not changed but the Range is redefined when the find criteria is found. The following example locates the first occurrence of the word "blue" in the active document. If the find operation is successful, the range is redefined and bold formatting is applied to the word "blue."

    With ActiveDocument.Content.Find
    .Text = "blue"
    .Forward = True
    .Execute
    If .Found = True Then .Parent.Bold = True
    End With

    The following example performs the same result as the previous example using arguments of the Execute method.

    Set myRange = ActiveDocument.Content
    myRange.Find.Execute FindText:="blue", Forward:=True
    If myRange.Find.Found = True Then myRange.Bold = True

    Using the Replacement object

    The Replacement object represents the replace criteria for a find and replace operation. The properties and methods of the Replacement object correspond to the options in the Find and Replace dialog box (Edit menu).

    The Replacement object is available from the Find object. The following example replaces all occurrences of the word "hi" with "hello." The selection changes when the find criteria is found because the Find object is accessed from the Selection object.

    With Selection.Find
    .ClearFormatting
    .Text = "hi"
    .Replacement.ClearFormatting
    .Replacement.Text = "hello"
    .Execute Replace:=wdReplaceAll, Forward:=True, Wrap:=wdFindContinue
    End With

    The following example removes bold formatting in the active document. The Bold property is True for the Find object and False for the Replacement object. In order to find and replace formatting, set the find and replace text to empty strings ("") and set the Format argument of the Execute method to True. The selection remains unchanged because the Find object is accessed from a Range object (the Content property returns a Range object).

    With ActiveDocument.Content.Find
    .ClearFormatting
    .Font.Bold = True
    With .Replacement
    .ClearFormatting
    .Font.Bold = False
    End With
    .Execute FindText:="", ReplaceWith:="", Format:=True, Replace:=wdReplaceAll
    End With

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width