Results 1 to 6 of 6

Thread: Need advice on removing duplicate lines in text file [Resolved]

  1. #1

    Thread Starter
    Addicted Member Porsche944's Avatar
    Join Date
    Apr 2005
    Location
    Ann Arbor
    Posts
    182

    Resolved Need advice on removing duplicate lines in text file [Resolved]

    Hey Guys,
    I have been thinking about this problem and have not be able to get any relevent information for google on this. I need to remove duplicate lines in a text file. Anyone out there have any advice on how to accomplish this?
    Last edited by Porsche944; May 27th, 2005 at 11:57 AM.

  2. #2
    type Woss is new Grumpy; wossname's Avatar
    Join Date
    Aug 2002
    Location
    #!/bin/bash
    Posts
    5,682

    Re: Need advice on removing duplicate lines in text file

    Depends how many lines you have. One way is to store the hashcode of each line in a collection. Keep adding them until the collection complains of a duplicate key. When that happens simply discard that line of the file.
    I don't live here any more.

  3. #3

    Thread Starter
    Addicted Member Porsche944's Avatar
    Join Date
    Apr 2005
    Location
    Ann Arbor
    Posts
    182

    Re: Need advice on removing duplicate lines in text file

    The amount of lines change change but, i doubt it will ever be over 500 lines. So what your saying is I could creating a hashtable and write each line to the hashtables key if the key exsists then go on to the next line?

    If so that's some great thinking and should be easy to create. There is probably a better more efficiant way but, this should work great for now.

  4. #4
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,106

    Re: Need advice on removing duplicate lines in text file

    Actually, there may not be a better way. The linnear approach would be to load them all into an array. As each is loaded, search all existing items for a duplication. Since this would mean comparing strings, and doing so MANY times, it would not be very efficient.

    Another option would be to move the lines into the array, and sort it. All duplicates would be next to each other. You would then just need to go through the array one time, cleaning up. However, the sort would be somewhat slow because this is text, and you would have to do a comparison.

    My understanding of hash tables is that they would be faster than this.
    My usual boring signature: Nothing

  5. #5

    Thread Starter
    Addicted Member Porsche944's Avatar
    Join Date
    Apr 2005
    Location
    Ann Arbor
    Posts
    182

    Resolved Re: Need advice on removing duplicate lines in text file

    That method worked perfectly. Not the best way to go about it but hey it works

  6. #6
    type Woss is new Grumpy; wossname's Avatar
    Join Date
    Aug 2002
    Location
    #!/bin/bash
    Posts
    5,682

    Re: Need advice on removing duplicate lines in text file

    The reason I suggested using a HashCode was simply to save RAM. At the time, I wasn't thinking about hashtables specifically, but I guess that would be at least as good if not better than a collection.
    I don't live here any more.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width