|
-
May 27th, 2005, 09:23 AM
#1
Thread Starter
Addicted Member
Need advice on removing duplicate lines in text file [Resolved]
Hey Guys,
I have been thinking about this problem and have not be able to get any relevent information for google on this. I need to remove duplicate lines in a text file. Anyone out there have any advice on how to accomplish this?
Last edited by Porsche944; May 27th, 2005 at 11:57 AM.
-
May 27th, 2005, 09:30 AM
#2
Re: Need advice on removing duplicate lines in text file
Depends how many lines you have. One way is to store the hashcode of each line in a collection. Keep adding them until the collection complains of a duplicate key. When that happens simply discard that line of the file.
I don't live here any more.
-
May 27th, 2005, 09:38 AM
#3
Thread Starter
Addicted Member
Re: Need advice on removing duplicate lines in text file
The amount of lines change change but, i doubt it will ever be over 500 lines. So what your saying is I could creating a hashtable and write each line to the hashtables key if the key exsists then go on to the next line?
If so that's some great thinking and should be easy to create. There is probably a better more efficiant way but, this should work great for now.
-
May 27th, 2005, 11:53 AM
#4
Re: Need advice on removing duplicate lines in text file
Actually, there may not be a better way. The linnear approach would be to load them all into an array. As each is loaded, search all existing items for a duplication. Since this would mean comparing strings, and doing so MANY times, it would not be very efficient.
Another option would be to move the lines into the array, and sort it. All duplicates would be next to each other. You would then just need to go through the array one time, cleaning up. However, the sort would be somewhat slow because this is text, and you would have to do a comparison.
My understanding of hash tables is that they would be faster than this.
My usual boring signature: Nothing
 
-
May 27th, 2005, 11:57 AM
#5
Thread Starter
Addicted Member
Re: Need advice on removing duplicate lines in text file
That method worked perfectly. Not the best way to go about it but hey it works
-
May 28th, 2005, 09:07 AM
#6
Re: Need advice on removing duplicate lines in text file
The reason I suggested using a HashCode was simply to save RAM. At the time, I wasn't thinking about hashtables specifically, but I guess that would be at least as good if not better than a collection.
I don't live here any more.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|